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received from all over the warld for this issue. We understand that the confirmation of final acceptance, to 
the authors / contributors, is delayed, but we also hope that you concur with us in the fact that quality 
review is a time taking process and is further delayed if the reviewers are senior researchers in their 
respective fields and hence, are hard pressed for time. 


We wish to express our sincere gratitude to our panel of experts in steering the submitted manuscriprs 
through multiple cycles of review and bringing out the best from the contributing authors. We thank our 
esteemed authors for having shown confidence in ВІЛТ and considering it a platform to showcase and 
share their original research work. We would also wish to thank the authors whose papers were not 
published in this issue of the Journal, probably because of the minor shortcomings. However, we would 
like to encourage them to actively contribute for the forthcoming issues. 


The undertaken Quality Assurance Process involved a series of well defined activities that, we hope, went a 
long way in ensuring the quality of the publication. Still, there is always a scope for improvement, and so 
we request the contributors and readers to kindly mail us their criticism, suggestions and feedback at 
bijit@bvicam.ac.in and help us in further enhancing the quality of forthcoming issues. 
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ABSTRACT 

This paper presents a novel controlling approach for 
Humanoid Robot to work safely in critical situations like bad 
light environment using Visual Teleoperation. In this regard 
modeling environments for Humanoid Teleoperation System 
ls developed. Here virtual reality modeling environment 
includes development of virtual Humanoid BHR-2, and 
virtual objects like table etc. The main goal of this work is to 
enhance our visual teleoperation system for BHR-2 in order 
to avoid any collision during real time operation. Saftware 
Maya is used for modeling and simulations. Maya plug-ins 
їп VC++ provides efficient modeling rule, real time 
. Interaction, and time saving rendering approach in a virtual 
environment. In this paper the validity of proposed scheme is 
shown by conducting experiments using affline step over 
trajectory to avoid obstacle in bad light environment. 


KEYWORDS 
Virtual Reality, Step over Trajectory, Visual Teleoperation 


1. INTRODUCTION 

In Teleoperation system, a human operator can control and 
monitor a remote robot and interacts with an environment 
while relaying information back to the human. Fundamental 
requirement for Teleoperation is high-fidelity video 
information. Cameras are usually unable to provide 
complete vision feedback especially in case of bed lighting 
environment. Virtual Reality (VR) can be a better approach 
for controlling the robot in such situation. If a computer- 
generated picture is substituted for the video picture, the 
viewer can be made to fell present (virtual presence, virtual 
environment, or virtual reality). 

Work related to modeling within virtual reelity [1] that 
displays the capability of VR to serve as a creative tool. 
Method includes drawing 3D lines using a tracker, surfaces 
based on 3D curves, and 3D objects based on 2D sketches. 
In [2] the hardware components to implement teleoperator 
presence with head-mounted display: were developed and 
evaluated, Head position was measured within a worksite. 
This drove a 7-DOF telemanipulator. To implement on 
whole body it is very difficult. 

Use of virtual reality in both the modeling and animation 
process is described in [3]. 

In order to plan motions a 3D knowledge of the environment 
is needed. Humanoid robot needs 3D representation of the 
world, can step on and over obstacles [4], [5], and go 
through narrow spaces and craw [6]. Main goal is to enhance 
the information available for the remote operator. 
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In teleoperation, it is necessary the perceptions from a 
physically remote environment conveyed to the human 
Operator in a realistic manner. This differs from virtual 
reality in which the perception from a simulated 
environment is conveyed to the user. Thus virtual 
environments and teleoperation share many of the same to 
user interface but in teleoperation the need for detailed world 
modeling is less fundamental [7], [8]. 

For better performance of the Humanoid teleoperation, it is 
desired to provide a complete scene of the robot and its 
worksite to the operator. One approach used the feedback 
real video images [9]. Some other teleoperation systems for 
humanoid robots displayed the real images captured by the 
cameras on the robot [10], [11], etc. These systems are easy 
to develop but are not suitable in the case of environment 
(e.g. full of smoke) camera can not shoot the images clearly 
for the operator to complete the task. 

A better visualization of teleoperation information i.e. a 
continuously available 3D graphics can be displayed for the 
robot's location and its environment using Virtuel reality. 
Researchers focused on building virtual models of the robot 
and rendering their configuration [12], [13]. But these two 
systems did not render the robot external data relative to its 
worksite. 

Many types of software are available for VE. Maya can 
widely be used as visual modeling tool. Maya uses MEL 
scripting language for components such as dialog boxes and 
tools, propriety file formats and plug-ins to simplify 
modeling and animation. This technique helps to ensure the 
data is updated in an efficient and control manner [14]. 

In virtual reality based Teleoperation system, rea! data from 
the environment experienced with а telaoperator and 
simulated data that experienced via a VE simulation can be 
fused via digital processing to produce intermediate 
environment of real and simulated objects. In this work, 
modeling refers to the data that are used to record the 
geometrical information for the environment. This 
information includes the shape of the objects in the 
environment, physical properties, and their interaction in the 
environment and the user for visual presentation of the 
environment. The main goal of building virtual environment 
is to describe interactions as well as the visualization of the 
environment. 

The existing Humanoid BHR-2 teleoperation system has 
four feedbacks which are: body sensors data of the robot, 
feedback by the robot vision system, real scene of the overall 
workspace and virtual scene monitoring system besed on 
motion capture system [15]. In this virtual scene monitoring 


system date feedback to operator without simulation, it will 
be difficult to monitor the robot in critical situation. 

In this paper, work of [16] has been enhanced to develop a 
complete graphical simulation — environment/ visual 
environment using software Maya to teleoperate and monitor 
real Humanoid under such circumstances where robot vision 
is not enough to avoid obstacle like in bad light 
environment, thus our system becomes through the 
visualisation choice a teleoperation system. 

This paper is organised as under: The virtual scene modeling 
technique is described in the section 2. In Section 3 method 
of building a virtual Humanoid is described while section 4 
describes the procedure to develop virtual objects. In section 
5 moton capture system is.discussed. Section 6 presents 
Simulation Environment and the experimental results and in 
section 7 conclusions are presented 

2. , VIRTUAL SCENE MODELING 

In this section, virtual scene modeling method is discussed. 
Maya software is selected for this work. It has the model 
function and rendering function. Building the surface and the 
skeleton of the role, it can render any motion of the virtual 
scene. It has the interface to add the function of data 
processing or other. Virtual scene includes virtual BHR-2 
and virtual furniture like table, chair, cupboard, stool, and 
simply any block etc for visual teleoperation system. 

2.1 Modeling Transformation 

3D Modeling transformations represented by 4x4 Matrices 
for scale, rotate, translate; shear, etc are used. For example: 
DOM ыш ceo Аны E 
can be used. 


(1) 


о 
or © © 
— پھ‎ ч MH 


Animation of a rigid body can be defined as arrangement of 
two transforms.or a hierarchy of transforms as under: _ 
positionMatrix * rotationMatrix (2) 


(3) 


positionMatrix = 


о о о ~ 
о о ~ о 
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Where x, y and z showing the components of the unit vector 
along the axis and s = sina, C= cosa andT-l-c. See 
for more details [17]. 


3. VIRTUAL HUMANOID BHR-2 
Firstly, a virtual skeleton model (the two skeletons model) 
like the BHR-2 has been developed which has 32 DOF. And 
the setup of the DOF is similar as the real robot. In the Maya 
software the value of each DOF can be changed, so it can 
display the mutual movement of two adjacent skeletons. 
Position and attitude data are accepted by a robot skeleton 
itself. 
After building the skeleton system, the surface of the robot is 
built by the Maya Tools, attached on the skeleton, the whole 
robot model is built completely (See fig.1). After developing 
the robot model, the attributes are added to accept the 
values. The model can render the robot motion and use the 
motion data of robot. 

The data processing: plug-ins is developed using Maya to 

obtain the motion data from a data file which is updating in 

time by the teleoperation platform feedback module. In the 
plug-in the motion data are evaluated to attribute the model 
joints 

The state of the robot can be rendered using Real-time joint 

angle data and Real-time position and attitude data. The 

body sensor data feedback to the platform while executing 
the order. Then the data can be used directly in the virtual 
scene, 

In the motion capture system markers/sensors are used to 

determine the coordinate's data of the markers on the robot 

body. Coordinates data of only 3 markers which are attached 
on the robot body can be used to obtain the position and the 
attitude data of the robot body. 

The virtual scene of the robot helps operator in following 

manner: 

I. The most important function of the virtual scene is to 
monitor the robot real-time. By rendering the real-time 
feedback from the robot; the virtual scene expresses the 
real situation of the robot and parts of its environment 
instead of the video picture from real camera. 

П. The орегаіо п change the view point easily to see the 
detail of the robot in environment. The exact state of the 
robot will be known 


4. VIRTUAL OBJECTS 
Procedure to make a virtual model of furniture like table etc 


` and to track them in teleoperation is discussed here. 


The following modeling operations are used to make 3D 
shape of table: 

Sculpting (either the NURBS or Polygon sculpting tool or by 
moving vertices, faces, CVS, or edit points), lofting, 
revolving (lathing), and extruding. 

Following steps are involved while drawing a virtual table in 
Maya. 

Stepl: Using “Create Curve" tool to draw a curve. This 
curve is just the sample of a proportioned object. For 


e 
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drawing this curve, choose the “CV Curve” or “EP Curve” 
tool from “Create” and then, sketch the outline of one side of 
a table leg on the right hand side of the Y axis in the “Side 
(XY)" view. After finishing the line, press enter (See Fig.2). 
Step 2: To make objects (аз in this case a leg of table) that 
are symmetrical around one axis the Revolve tool is used. 
The results can be edited of the "revolve" by changing the 
attribute in the channels window by clicking the “revolve!” 
under the input labels. Thus a table leg appears (fig. 3). 
Using Lofting Tool a surface between two or more curves 
can be created. To create a few curves that are in different 
planes, oesy way is to create one curve and then duplicate it 
and move duplicate away from each other (See fig.4). 
Objects can be easily bent, twisted, tapered and sheared etc. 
Select Deform > Create Nonlinear > Bend. Now click and 
drag to bend the object around the axes in the view as 
dragged in. 

Step 3: Create a table top and position it over the table legs 
by creating & cube and scaling it accordingly, or changing 
the values under the inputs In the channel window (See 
Fig.5). 

Step 4: Give a sufficient amount of thickness to the table top 
so that the legs penetrate in to the table top. Select only the 
top edges of the table top for beveling the top of the table. 
Finally & table is created as shown in Fig.6. Using this 
technique, the other models can be developed. 


5. MOTION CAPTURE SYSTEM 

Motion capture system can be used to track objects. The real 
time server processes the data from the motion capture 
hardware to provide client applications with Cartesian 
coordinates of the markers. Client code interact how the 
markers and rigid body define the position and orientation of 
tracked objects. A motion capture system records the 
position and orientation of a performer using a number of 
sensors attached to the body. The sensors may be 
mechanical, magnetic, or optical. The client can then make 
calis to the tracked object to receive rigid body motion 
capture system to be captured and translated to a digital 
character. Tracked objects based around marker data require 
clients to specify how the view, up and define a vector. By 
combining two markers client define a vector. These vectors 
are added as constraint to the view, up or right vectors of the 
tracked object using cross product or average mode. Client 
will specify the position of the tracked object by specifying a 
list of markers and an associate mode. The server requires at 
least three markers to track a rigid body. 

For instance: moving a virtual chair by placing markers on a 
physical chair. 

The marker based scheme supports tracking/ rendering the 
shape of rigid and non rigid bodies and is much more 
flexible. A plug-ins system in Maya allowed user to save 
data for a virtual environment with the same ease saving a 
document from & word processor. 
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By motion capture system, we can obtain the robot motion 
data. The data is expressed as the coordinates ог markers 
which are attached to the body. 

Data structures that are used to record the geometrical 
information for the environment include the shepe of the 
objects in the environment, their moving parts an.] physical 
properties, and the behaviors that they can perform (how 
they interact with other objects in the environmen: and with 
the user). Data-server device is used to get this dam. А plug- 
ins system is developed to save the data for a virtual 
environment. 

Using teleoperation platform, the data of the robot joint 
angie can be sent back to the operator real-time: The real- 
time body sensor data and the motion data are transferred to 
teleoperation platform. By the real-time data fusien module, 
these feedbacks will be processed to integrate datr. which the 
3D interface can render. By the virtual model they are finally 
rendered as the animation. With fusing the fee Баск data, 
the strategy is that the robot body sensor data be rendered 
directly. In fact, by more than 3 markers motion data 
attached to the rigid body, we can calculate the «hole robot 
position and attitude data. After calculating this data it will 
be rendered. For more details refer [15]. 

The joystick can be interfaced to allow user to specify the 
location of key frames and to modify the current ime. 

Here the simple method has been discussed for loading and 
interacting with 3D models of robot environmen: in order to 
operate and monitor easily. 


& SIMULATION ENVIRONMENT . 

Fig.7 shows the simulation environment for humanoid BHR- 
2 Teleoperation. Virtual/ Simulation environmect shows that 
through simulation, the overall behavior of the -obot system 
can be visualized and tested under a variety of 
circumstances, 

By this platform, operator watches the virtual scene 
displayed in a plat screen and can remotely control the 
humanoid robot to complete the task like waking, etc i.e. 
stepping over an obstacle by using the control irterface. 

The goal is to visualize robot model for collision free 
maneuvering to avoid obstacle in case of poor visibility 
situation 


6.1 Evaluation and experimental results 

The motion capture system based on 8 infrared cameras is 
equipped in the working site to get the moticn data of the 
humanoid robot. There are two computers ir the working 
site. The left one is the remote cockpit computer and the 
right one is virtual scene computer. The virtua! scene will 
fuse the multiple kinds of real feedback data fom the robot, 
and render the result to the virtual model of thc robot. 

In the experiment, the robot begins using œfline walking 
trajectory for a task and is commanded with « safe step over 
trajectory (a trajectory which will bring the robot to step 
over obstacle) for reaching to a ball from table safely. Such a 
trajectory we refer to as a Safety Stopping “Trajectory. The 


operator input the walking instruction to the robot by 
keyboard/ joystick. The new motion is calculated and will 
bring the robot to a stop step over. A simulated behaviour of 
the walking system is shown in Fig.8. When operator detects 
a potentially dangerous collision, then using step over 
command causes the robot to follow a step over trajectory. 
Experimental timing results a step over trajectory can be per 
formed in roughly 10-20 msec. Simulation scene is shown in 
fig. 8 whereas the corresponding snapshots of experiment 
are shown in fig. 9. 

From start to step over, figure 9 depicts snapshots for: 

]. The beginning of walking motion with stepping starting 
from the left foot over obstacle, 

2. Ending of the walking motion, after stepping over the 
right foot obstacle. 

Tha exul of these ERs proved the effectivensi of 
the: controlling method for visual humanoid teleoperation 
system. 

By the real time stability control, the actual motion 
trajectory differs from the design trajectory. The virtual 
scene renders the two kinds of data in the simulation scene 
and in the real monitoring scene. The difference can be 
distinguished as the operator observing the details of the 
robot in the two scenes. - 


7. CONCLUSIONS 

In this paper visual environment for Humanoid BHR-2 
Teleoperation System is developed for safe/ collision free 
manoeuvring. The scene modeling procedure has been 
described to create the 3D models of objects in the scene. 
The. interface between the virtual BHR-2 and real BHR-2 
was developed for rendering the data using teleoperation. 
Simulation environment shows that through simulation, the 
overall behavior of the robot system can be visualized and 
tested under a variety of circumstances. Experiment results 
proved that the operator sense of presence enhanced for a 
task in case of poor visibility. 
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ABSTRACT 

Information hiding has emerged as an important research 
field to resolve the problems in network security, quality of 
service control and secure communications through public 
and private channels. Keeping the network in a desired state 
is the utmost requirement of network communications. The 
work is being done in different fields to achieve this goal. 
Strganography is one of the branches of information hiding 
that is used to solve this problem. In this paper we present a 
Steganographic algorithm based on wavelet transforms. Our 
algorithm first uses the Best T-codes to encode the message 
before embedding into a cover image. The one of the 
advantage of this is that we can embed high capacity 
messages into the cover objects. The second advantage of 
using T-codes is self-synchronization attained at decoding 
stage. To achieve better imperceptibility of stego-image, we 
have embedded the encoded message into the cover image 
using wavelet fusion technique more than once, by selecting 
each time the wavelet block pixels using the pseudo random 
permutations. From the experimental results we have 
observed that the algorithm is imperceptible and can have 


100% embedding capacity. 


KEYWORDS 
Steganography, SSVLC, DWT, PSNR 


1. INTRODUCTION 

In this information era, elther a public network or private 
network, one requires a tool that can allow communicating 
over these channels and as well providing the security and 
robustness of the hiding data. The information hiding has 
emerged as a useful and important field for resolving the 
problems of public network security and secure 
communications. There are three main streams of research 
areas over which this field is focused at present and they are 
Steganography, Watermarking and Cryptography. In 
Cryptography, the data is encrypted so that it cannot be 
understood by anyone else. The encrypted data is unreadable 
but is not hidden from the eavesdroppers. Though the 
purpose of Cryptography is to protect the data (or 
information) from unwanted attackers, it does not ensure 
covertness on the channel. The Steganography solves this 
problem by embedding data in the cover object so that it is. 
hard to detect. The branch of Watermarking is to embed a 
watermark for the purpose of copyright protectction, 
authentication and temper proofing. 

There are mainly four requirements of any information 
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hiding technique, namely, Impercptibility, Capacity, 
Security and Robustness.  Imperceptibility means that 
human eyes cannot distinguish the difference between the 
steg-image and the original image. Capacity refers to the 
amount of data that can be embedded in the cover object. 
Security means that an eavesdropper cannot detect the 
hidden data, and Robustness requires that the hidden data 
can be recovered within certain acceptable errors even when 
the steg-image has endured some signal processing or 
noises. 

Now-a-days Cryptography or Source encading methods 
have also been used in conjunction with Steganography to 
provide an additional layer of security. Over time the 
information hiding techniques have improved to meet the 
desired goal. Digital steganography provides privacy for 
intelligence and military personnel and for people who are 
subject to censorship. 

There are various domains of information hiding viz., spatial 
domain, transform domain and spread spectrum domain. 
The transform domain based hiding techniques has not only 
the potential to achieve higher capacity than the spatial 
domain based techniques, they are also found to be more 
robust. 

Apart from text, images have been used widely as cover 
objects for the purpose of information hiding as their digital 
representation provide high degree of redundancy. The most 
popular transform hiding techniques Steganography systems 
are based on discrete Fourier transform (DFT), discrete 
cosine transform (DCT), discrete wavelet transform (DWT), 
singular value decomposition (SVD) transform and discrete 
Hadamard transform (DHT). These techniques аге 
independent of an image formats and hide data in more 
significant areas of the transformed image. The details about 
these techniques can be found in [1-3, 9,10, 19, 21, 29]. 

In this paper we present a Steganographic method based on 
wavelet transform. We have first used best self- 
synchronizing T-codes to encode the original text. The 
purpose of using the T-codes is lying in the inherent self- 
synchronizing property of T-codes. According to [25], T- 
codes require anything between 1.5 to 3 symbols to attain 
synchronization following a lock loss. Also, by sending the 
message in the cover image in compressed form increases its 
security as well as embedding capacity. The secret message 
is then embedded into the cover image using wavelet-fusion 
technique [26] with a stego-key. To increase the quality 
(hence, PSNR value) of the stego-image to meet the 
imperceptible attribute of the steganography we embed the 


message in the cover image number of times but each time 
we use pseudo random number generator to select the pixel 
locations in the block. In the extracting algorithm we obtain 
the hidden message by taking the average of the messages 
extracted from the stego-image using the stego-key. To 
check the robustness of the algorithm, we have analyzed our 
algorithm against noise such as Salt and Popper, Gaussian 
and Speckle and found satisfactory results. 


2. SELF-SYNCHRONIZING VARIABLE LENGTH 
CODES r 
The categories of coding that minimize redundancy of 
information are Entropy coding, Source coding and Hybrid 
coding. Entropy coding is a lossless process whereas source 
coding is а lossy process. Most multimedia systems apply 
Hybrid coding techniques. The popular variable length most 
codes (VLC) for loss less compression used is Huffman 
codes. However, when an uncorrected error occurs in the 
encoded data it may propagate to the extent that all 
subsequent data are lost. Thus, one requires VLC with the 
property that data may resynchronize automatically after an 
error occurs in a minimum delay. There can be another 
problem of slippage which occurs However, if the number of 
symbols decoded before resynchronization are found to be 
different from the actual number of data symbols which have 
been encoded, raises the problem known as Slippage 
problem [16]. The slippage problem may leads to 
misinterpretation of the remaining data that howsoever may 
have been received correctly. 
There are number of methods proposed to find the solution 
of synchronization problem. Some of the proposed 
techniques used restart markers but they increases overhead, 
ie. bit rate. Thus, researchers realized that the VLC that 
provides the synchronization without the increase in 
overhead is needed. Gavin R. Higgie [7], Mark R. Titchner 
[23] and A.C.M. Fong [ 5] proposed a self-synchronizing 
VLC, viz, T-code. According to Titchener [25] , T-codes re- 
synchronize within one to three code words. G. Ulrich [27] 
and P.Reddy [20] have shown that T-codes exhibit better 
synchronizatjion properties when compared to Huffman 
codes. A.C.M. Fong et al have proposed the application of 
minimal sync-delay T-codes for information source coding. 
G.Y. Hong et al [8] have also investigated the application of 
self-synchronizing VLC (SSVLC). 
S.K.Muttoo and Sushil kumar [11-13] have shown the 
application of Best T- codes in the two popular 
steganographic algorithms, Jpeg-Jsteg [28] and OutGuess 
0.1 [17]. 


A. T-codes 

T-codes are familles of VLCs that exhibit extraordinarily 
strong tendency towards self-synchronization. The concept 
of ‘simple T-codes’ was given by M.R.Titchner{23]. He 
proposed a novel recursive construction of T-codes known 
as the ‘Generalized T- codes’ that retain the property of self- 
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synchronization [24]. Each  T-augmentation step is 
characterized by two parameters: a "T-prefix' p, a codeword 
from the existing T-code and a ‘T-expansion parameter’ k, a 
positive integer. Starting at augmentation level 0 with initial 
set S^ (0, 13, the construction of T-codes at augmentation 
level 1, 2 and 3 are summarized in the table below: 

There can be many possible code sets matching a source 
depending on the parameters (p, k) chosen [24]. Apart from 
the generalized class of self-synchronizing efficient codes, 
T-codes show the best synchronization performance 
amongst the most efficient VLC's and require anything 
between 1.5 to 3 characters to attain synchronization 
following a lock loss. Among the subgroups of T-codes, the 
search for a best T-code set means those T-code sets that are 
optimally efficient and at the same time exhibits the least 
synchronization delay. Different T-codes exhibit different 
degree of synchronization performance, even if they have 
the same average code word length. The Expected (or 
Average) synchronization delay (ESD or ASD) is normally 
used as measure of synchronization performance. The ESD 
is defined as the average number of symbols in S that the 
decoder has to receive before it can conclude that it has 
achieved synchronization with respect to its largest level set. 
A number of attempts have been made (o quantify the 
synchronization performance of different T-codes [25, 27, 


5]. ; 

Ulerich Gunther [27] in his thesis has given a recursive 
search algorithm that yields the T-codes set with the 
minimum redundancy for a given source. This search 
algorithm utilizes equivalence and feasibility criteria to 
significantly restrict the search space. The best T-codes used 
in our algorithms in this paper are based on the breadth-first 
search algorithm proposed by Ulrich Gunther [27]. Ulrich 
chooses the least redundant set from a poo! of all possible T- 
code sets by calculating redundancy for each of them. The 
search process is optimized by certain proposed constraints. 
The algorithm retums a group of code sets with least 
redundancy. To choose the best code set with least 
synchronization delay, we test each code set against very 
long test message string (composed of source symbols) by 
calculating ESD. 


3. THE PROPOSED 
ALGORITHM 

A large number of image Steganographic methods have been 
proposed over the last few years to achieve better 
perceptibility, best data hiding rate, survivability and 
security. The most of these embedding algorithms in a 
transform domain make use of DFT, DCT, DWT or DHT. 
Eric A. Silva and Sos S. Agaian [22] have embedded data in 
different transform domains and observed that the Haar 
wavelet transform is the best choice as compare to FFT, 
DCT or DHT for their method. However they observed that 
the relative performance of each of the transforms used were 
uniform across all images tested. 

Our proposed method is a high capacity image 
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steganogrphic method using Wavelet-fusion- method 

proposed by M. Fahmy Tolba and Al-said Ghonemy [26]. 

The proposed algorithm consists of four parts: Encoding, 

Embedding, Extraction and Decoding. Our algorithm 

provides multi-level securities. First in encoding stage, we 

apply Best T-codes on the message for source coding. An 
encoded key is used for this purpose. The secret ( encoded ) 
message is then embedded in the cover image using wavelet- 
fusion-technique. To enhance the quality of stego-image we 
have embedded the message in the cover image number of 
times. The stego-key is used to select random pixels for 
embedding message, We require the stego-key to extract the 
hidden message. Finally, in the decoding stage, the original 
message is obtained with the help of encoded key. The steps 

of these algorithms are described in the figures 3.1 and 3.2. 

The Embedding algorithm can be summarized as follows: 

0. Input the Cover image and original text (or message) 

1. Normalize the cover image. i.e. the pixel values made 
to lie between 0.0 and 1.0. 

2. Apply preprocessing on cover image: choose ‘alpha’ 
(preferably between 0 and 0.1) and reconstruct pixels to 
lie in the range [alpha, 1 — alpha]. This will ensure that 
pixels from tho fused coefficients (during embedding) 

. would not go out of range and hence the secret message 
will be recovered correctly. 

3. Apply 2D Haar transform on each color plane 


separately. 

4. Encode the original message using best T-codes. The 
resulting secret message is a bit-stream of 0 and 1, 
denoted by (m, m;.... Ma), where n is the embedding 
message length. 

5. Generate pseudorandom permutation, using a stego-key, 
of the size equal to the length of cover image. 

6. Enter the number of times the message to be embedded, 
num. 

7. for i= 1 to num do 
7.1 Select wavelet coefficient of the transformed image 

randomly, say f(j, k) 

7.2 Embed the secret message bit, m (1), into the 
transformed image in the following way: 
if m()7'l' 
f (,k) = f G.k) + alpha; 
else 


f G,K) = f G,k) – alpha; 
8. Apply the inverse 2D Haar transform on each color 


9, Denormalize the image 

10. Output: the Stego-image. 

The Extraction algorithm is just the reverse process of the 

embedding method. We can summarize it as follows: 

1. Apply 2D Haar transform on each color plane of the 
stego-image 

2, Enter num, number of times message bwing embedded 

3. Initialize the hiddenmessage to zero. 

4. for j= 1 to num do 
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4.1 Select the embedded coefficients, i, using the 
PRNG based on the stego-key same as used in 
the 


embedding procedure. 
42 Extract the embedded value of alpha by 


оо ~ CN tA 
T 
+ 
~ 


тит; 
Decode the hiddenmessage using best T-code using the 
encoded key. 
Output: Original message. 


4 EXPERIMENTAL RESULTS 

For testing our algorithm we have used 256 x 256 pixels 
images’. The values of alpha are taken from 0.05 to 0.5 and 
number of embeddings taken from 5 to 15. For measuring 
the imperceptibility we make use of the measure PSNR 
defined as follows: 


PSNR = 10 10810 (2552/ MSE), 


MSE-(I/N) 2 YY (gj -x'y)2, 

where x denotes the original pixel value, and x' denotes the 
decoded pixel value. 

Some of the results are summarized beow in the table 4.1 
and figure 4.1. 


5, CONCLUSION 

We observe that choosing the value of alpha between 0 and 
1, preferably 0.05, we can achieve best porceptibility. We 
also observe that the PSNR values decrease as we increase 
the number of times of embedding of message in the cover 
image, but still remains in the acceptable range of 35 to 40. 
Our algorithm provides maximum embedded capacity in the 
cover Image. The embedding capacity is equal to 3 times the 
number of pixels contained in the color image, i.e., capacity 
percentage is 10096. 

There are multi-level securities proposed in our algorithm. 
We have-encoded the message using self-synchronizing T- 
codes with a key and the encoded message is embedded in 
the Haar wavelet transform coefficients of image using 
another key, called stego key. The  Wavelet-fusion- 
technique further uses a value of alpha. This value is secret 
and shared by the sender and receiver. The value of alpha is 
used to adjust the normalized cover’s pixels. The advantage 
of Best T-codes is seen at the decoding stage where dus to 
its self-synchronizing property we obtain the original 


message even after if signal processing noise being added to 
stego image. 


6. NOISE ANALSIS 

We have analyzed our algorithm for robustness by adding 
noise to stego-images of jpg format. The results of their 
PSNR зо obtained are summarized in table 6.1 and figure 
6.1. 
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Figure 3.2 : The block diagram of the message extraction (alpha=0.5, nom=10) (alpha=0.09, nnm=15) 
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paradigms used in distributed computing, such as - client- 
server paradigm, remote procedure paradigm and mobile 
agent paradigm. The client-server is based upon the concept 
of a server, which serves the various request of the clients 
and in remote procedure call approach, a machine can 
connect to another machine and retrieve the information 
remotely. The mobile agent technology is built upon the 
advancement in computing and communication technology 
over the wired and wireless networks. Mobile agents are the 
software programs that can migrate from one machine to 
another machine іп а as well as 
heterogeneous environment, It can migrate in connected as 
well disconnected network. On each machine, the agent 
interacts with stationary service agents and other resources 
to accomplish йз task Mobile agents are particularly 
attractive in distributed information retrieval applications. 
By moving to the location of an information resource, the 
agent can search the resource locally, eliminating the 
transfer of intermediate results across the network, by this 
property, mobile agent reduce the end-to- end latency. In 
this paper, we try to point out the benefits and limitations of 
thése paradigms. 


KEYWORDS 
Mobile agents, Client-Server, Remote Evaluation. 


1, -INTRODUCTION 

A mobile agent is a software program that can migrate 
during execution from one machine to another machine in a 
homogeneous as well as heterogeneous network. In other 
words, we can say that an agent can suspend its execution, 
migrate to another machine, and then resume execution on 
the new machine from the same point at which it left off. 
Mobile agent are the platform dependent, so platform should 
be needed on each machine, the agent interacts with 
stationary agents and other resources to accomplish its task. 
There are two alternative approaches [1] to retrieve the data 
— the code to data approach and the data to code approach. 
The mobile agent paradigm performs better if the code size 
is small enough; this model is being extended to support 
different migration strategies resulting in less network traffic 
and better response time. Mobile agents are not always 
better than client-server calls. Mobile agent is only 
beneficial, if the space overhead of the mobile agent code is 
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not too large or if the wireless link connecting the mobile 
user to the fixed servers. 

In this paper, we compare mobile agents with classical 
client-server techniques and other mobile-code systems. The 
implementation of mobile-agent is easy than traditional 
client-server implementation. But one question arises, which 
distributive computing paradigm is better and why? So, let 
us consider these paradigms one-by-one. 


2. CLIENT - SERVER (CS) PARADIGM 

The examples of traditional client-server middleware like 
CORBA, RMI and DCOM. 

In a classical CS paradigm, processing of the data mainly 
takes place in the host client. In fact, the Job of the-server is 
limited. Server executes only some basic procedures for the 
data retrieval and storage. Before being sent to the client, 
data only undergo a soft initial filtering. 

The host server behaves as a simple remote storage system. 
Together with all of the other servers and the interconnection 
network, makes the whole system to form a "big repository" 
of information available to the different clients. The server 
usually makes available some procedures for handling the 
stored data which are designed for responding to criteria of 
general effectiveness. The actual data processing is therefore 
left to the host client, where the user can execute procedures 
for the kind of processing desired. This type of CS scheme is 
used when we want to create a very simple system from the 
management point of view, or structures with a high level of 
security. Such a paradigm depicted in Figure 1. An 
advantage of this architecture is the possibility of controlling 
the type and the ways of access to the data stored in the 
server, Consequently, security in the CS architecture is very 
high. Here, in this paper, we consider the following 
question- is the mobile agent paradigm “better” than 
traditional client-server paradigm? In the next section, we 
will try to find out why the mobile agent is better to other 


paradigms. 

In fact, if the user has specific requests concerning the modes 
of data processing, and If the server does not provide for that 
specific type of operations, the only possibility commits in 
retrieving much more data than needed, and then to perform 
the operations of processing and selection in the client. In 
these cases, the server provides a huge amount of documents, 
in order to assure a wide basis of selection. Of course, all that 
causes an overload of both the server and the communication 
system. In fact, the amount of data exchanged may be 


13 


considerable. Consequently, the host client must have its own ; P 


processing capability. 


3. REMOTE EVALUATION (REV) PARADIGM 
Unlike the typical Client - Server, Remote Evaluation (REV) 
paradigm implies that server receives not only the processing 
requests from the client, but also the whole code needed for 
performing operations of selection on the data stored. The 
response of the server, with no additional overhead, is limited 
to sending the information that can be actually used and 
required by the client. The REV is based on the code to data 
strategy therefore it better to CS paradigm. 

Besides, since tho user can use a customized code in the 
server, the data sent in output are ready for the use, and they 
only need negligible additional processing. From this point of 
view we can also, think of an environment, host clients 
equipped with minimum processing potentialities. The initial 
cost is therefore higher in comparison with the CS paradigm, 
and is localized in the opening stage of sessions. 

In fact, the code for the data processing can be of 
considerable size and we can easily assume that its size is 


„higher than а simple retrieval request. Of course, this cost is ir 


counterbelanced by the. reply stage, (transmission ‘of search 


results from the server to the cljent), because the amount оѓ. 


data passing through the network is’more limited. During the 
stage of design, a system; with 'REV. must be created by 
considering more detailed aspect in comparison with the CS 
paradigm. In fact, the processing architecture of the different 
servers must be similar (or very well known), so that the 
code sent by the client can be easily executed on all of the 
hosts. From this point of view, we can think of a common 
platform for code execution. This also implies the need for 
creating protection elements that could assure a high level of 
security. 


4. MOBILE AGENT (MA) PARADIGM 

A Mobile Agent (MA) is an executable code. that can move 
from a host to another host, according to the mobile agent 
itinerary, which may be static or dynamic itinerary. Basically 
Mobile agent consists of three components. [2], code 
statement, data state, and execution state. Code is transferred 
during the migration; even some data state can also transfer. 
' But execution state cannot transfer in the network. This way, 


there is a kind of suspension of.the execution ofthe program,  . 
waiting for the subsequent resume state-[5] on a remote -` 
machine. Both (Mobile Agent and Remote: ; Evaluation’, .. 
paradigm) use the same strategy code to data. The;system of 2 


Remote Evaluation is а more limited approach than the MA. 


In fact, а code migration is present in the REV; but there is `- 


always a direct interaction between the client and the server. 


This means that the code sent by the client returns the data 
directly to the source. Besides (when this operation is done), { 


the process is completed, so the context-of execution of the 


* program is limited to the single host. Conversely, the mobile: 


agent system can be used for’ performing the research 
S e E In fact, the agent has the 


aut 
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procedures for operating on the database according to the 


ways desired by the user, and can also make independent 


“| decisions, such as migration to other sites or returning the 
* results-obtained to the user, if they are considered sufficient. 


In this sense, the interaction between the user and the agent is 
limited to the stages of transmission and retnn of data. What 
takes place within this time limit depends only on the way 
the agent was designed. 

By moving the code to the data ( see in Figure. 3), a mobile 
agent can reduce the latency of individual steps, avoid 
network transmission of intermediate data, continue work 
even in the presence. of network disconnections, and 
complete the overall task much faster than a traditional 
client/server solution. 

We can therefore expect that the amount of data transferred 
in each migration tends to increase. The agent can decide to 
limit the data considered interesting for the user dynamically, 
even by discarding the data selected in the previous hosts. 
Agents with a maximum quota of user data, which can be 
moved in each migration, can therefore be designed. 

A MA, shown in Figure 3 is an autonomous transportable 
. program (or object) that can migrate under its own or host 


- hontrol from опе node to another in a heterogeneous 


network. In other words, the program running at a host can 
suspend its execution at an arbitrary point, transfer itself to 
another host (or request the host to transfer it to its next 
destination) and resume execution from the point of 
suspension. 

A MA migrates from one host to other host on the behalf of 
itineraries [3, 4]. It may be either static or dynamic. Itinerary 
defined by some parameters such as Agent Id, State Type, 
Time and Place. 

When the agent reaches a server, it is delivered to an agent 
execution environment. Then, if the agent possesses 
necessary authentication credentials, its executable parts are 
started. 

Mobile Agent paradigm of the distributed computing is 
different for other paradigms. In other 

paradigms, independent processes collaborate by exchanging 
data over their network links. With Mobile agents, a process 
is transported, carrying with it the shared data as it visits 
individual processes on its itinerary. 


4.1. LIFE CYCLE 

The model of mobile agent paradigm is based on the 
migrating workflow [6, 7] system, model. The resuming 
instance is the task executor in the migrating workflow 
system; it is a mobile agent in essence. Our workflow- 
orlented life cycle model consists of five life states, 
(creating, running, deleting, suspending, resuming) and a 
number of transitions (active, suspend, dispatch, resume, 
terminate) between these states. The workflow-oriented life 
cycle model of mobile agent is shown in Figure. 4. 

To accomplish its task, the mobile agent can transport itself 
to another server in search of the needed resource/service, 
spawn new agents, or interact with other stationary agents. 
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Upon completion, the mobile agent delivers the results to the 

sending client or to another server. 

1, In the creating ише, the agent is created but not 
activated yet. 

2. In the running state, the agent is running, performing 
actions and solve it pursue. 

3. In the deleting state, the agent is terminated; 

4. In the suspending state, the agent can not run and still 
stay within the agent server; 

5. In the resuming state, the agent is travelling between 
two server Instances. 


42 MOBILE AGENT LIFE 
STRUCTURE 

The life cycle of MA begins at the moment when it is 
created. When MA migrating from one host to another host 
in order to achieving its goals; and the MA returns its server 
on which it was created. Two or more than two states, in life 
cycle of MA, may be occurred at the different time or place. 
The mobile agent life state log structure [7] can be defined in 
four-tuple: 

Life State Log Structure- (Agent ld, State Type, Time, 
Place), Where, 

1. "Agent ld' identifies a log item belongs to which mobile 


STATE LOG 


agent; 

2. 'State Type' indicates the type of mobile agent life state, 
with 

3. State Type € |. STATUS-(Cresting, Running, 
Suspending, Migrating, Deleting }. 

4. "Time' indicates the time when the mobile agent 
(Agent __ Id) came to the current State Type; 

5. ‘Place’ identifies the agent server where the mobile 
agent (Agent ld) came to the current State-Type at the 
specific time. 


43 APPLICATION AREAS OF MOBILE AGENTS 
Mobile agents provide effective and flexible mechanisms for 
structuring distributed systems. The Mobile agent paradigm 
can be exploited in a variety of ways, ranging from low level 
system administrator tasks, to middleware to user-level 
applications. They can be mapped directly to real life 
situations, 

The concept of a mobile agent can be applied to the 
Information Retrieval Systems (IRS), Distributed File 
System Clinical Data analysis for medical diagnosis, 
Distributed Data Mining, Distributed Real-time systems, 
Mobile Wireless Environment, Mobile Smart Databases, 
Peer-to-Peer Computing, Network monitoring and 
management, Intrusion Detection System, Network routing, 
Performing  location-dependent computations, Load 
balancing, Service customization, Wireless Sensor 
Networks(:WSNY Remote Sensing, Wireless Ad hoc 
Network (WAHN), Manufacturing, Command & Control, 
Grid Computing/Cluster Computing, and Information 
dissemination etc. 
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5. COMPARISONS 
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2 
ation 


PE MEE | 





Table 1. Comparison between various Distributed 
Computing Paradigms | 


5. CONCLUSION 

Here, in this paper we have discussed the three basic 
peradigms of distributive computing, namely: Client-Server, 
Remote Evaluation and Mobile Agent. CS implementations 
are suitable for small applications where a amount of 
information is retrieved from a few remote servers having 
low processing delays. However, most real-world 
applications require a large amount of information to be 
retrieved and significant processing at the server. MA’s scale 
effectively as the size of data to be processed and the 
number of servers the data is obtained from increases. 

We conclude that mobile agent paradigm is the best as other 
paradigms, it consume lesser resources but have the 
limitation on the size of the code. So, it can be used 
extensively in a code-to-data environment. This paradigm 
can be exploited in many application areas, such as data 
mining, weather forecasting etc. 


7. REFERENCES 

[1] A. Puliafito, S. Riccobene, M. Scarpa “An analytical 
comparison of the client-server, remote evaluation and 
mobile agents paradigms”, Proc. of the First 
International Symposium on Agent Systems and 
Applications. Page: 278-84, 1999. ISBN:0-7695-0340-3. 

[2] Carzaniga, A.; Picco, G.P.; Vigna, G., “Js Code Still 
Moving Around? Looking Back at a Decade of Code 
Mobility’, Software Engineering - Companion, 2007. 


15 


BVICAM’s Intemational Journal of Information Technology 


29th International Conference on Software Engineering, 
20-26 May 2007, Pp 9-20, Digital Object Identifier 
10.1109/ICSECOMPANION.2007.44. 

[3] Daniela Rus, Devika Subramanian, "Information 
Retrieval, Information Structure and Information 
Agents", ACM Computing Surveys (CSUR) archive 
Volume 27, Issue 4 , (December 1995) Pp: 627 — 629. 
ISSN:0360-0300. 

[4] L. Miao and H. Qi and F. Wang, “Self-deployable 
mobile sensor networks for on-demand surveillance", 
Sensors, and Command, Control, Communications, and 
Intelligence (C3I) Technologies for Homeland Security 
and Homeland Defense IV, at SPIE Defense and 
Security Symposium, vol. 5778, 2005. 

[5] Umar, A., “A Comparison of Mobile Agent and Client- 
server Paradigms For Information Retrieval in Virtual 
Environments”, Proc. of First International Conference, 
“Next Generation Enterprises: Virtual Organizations 
and Mobile/pervasive Technologies”, April 2000. 

[6] Shinichi Motomura, Takao Kawamura, Kazunori 
Sugahara, "Persistency for Java- based Mobile Agent 
Systems ", Proc. Third International Conference on 
Internet and Web Applications and Services. Pp. 470- 
475, 2008, ISBN: 978-0-7695-3163-2. 

[7] YANG Gong-ping, ZENG Guang-zhou, “Mobile 
Agent Life State Management", IMACS Multi- 
conference on Computational Engineering in 
Systems Applications(CESA), October 4-6, 2006, 
Beijing, China. 








Figure 4, Life cycle of Mobile Agent 





Figure 3. Mobile Agent Architecture 


Copy Right © BIJIT - 2009 16 


BVICAM'’s Intemational Journal of Information Technology 


Bharati Vidyapeeth's Institute of Computer Applications and Management, New Delhi 


Service Oriented Architecture for Business Dynamics: An Agent Based Business 
Modeling Approach 


O. P. Rishi 
Birla Institute of Technology, Mesra, BIT Jaipur Campus (INDIA) 


ABSTRACT 

In today's rapidly changing environment the industries are 
interested in executing business functions that has scope in 
multiple applications. Business dynamics and technological 
innovations have felt organizations to comply with a 
disparate mix of operating systems, applications and 
databases. This makes it difficult, time-consuming and costly 
for IT departments to deliver new applications that integrate 
heterogeneous technologies. It demands high inter- 
operability and more flexible and adaptive business process 
management. The inclination is to have systems assembled, 
from a loosely coupled collection of Web services, which are 
universal and integrated. This technical area appears to 
have scope where the Agent Technology can be exploited 
with significant advantages. With Service Oriented 
Architecture a decomposable architecture, and associated 


composed of loosely coupled services communicating via 
pre-established protocols, these services can be assembled 
ad-hoc to form customized applications that address a wide 
variety of business requirements, 

In the present paper, we propose a conceptual framework 
for agent-based Service Oriented Architecture (SOA). In 
which we try to integrate Service Oriented Architecture with 
the agent technology & other tactical technologies like web 
services, business workflow services, Business meta-rules, 
search optimization of services and semantic Web 
technology for business service mappings. 


KEYWORDS 
Multi-agent systems; Service oriented architecture; Business 
workflow & services; Business dynamics. 


1. INTRODUCTION 

Today the technology world believes that adoption of a 
Service Oriented Architecture (SOA) paradigm is strategic 
and should be part of the most software projects. Agent 
technology is considered to be the most successful 
technology supporting Service Oriented Architecture. It is 
known that Agent technology is used to implement complex 
systems and applications that are communication-centric, 
based on distributed computational and information systems, 
and requiring autonomous components reedily sdaptable to 
changes. Agent plays the role of efficiently supporting 
distributed computing and allows the dynamically 
composition of Web services [10, 11]. Now it is desired that 
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agent technology integrate with other enterprise computing 
technologies to improve the computational proficiency. 
There are several unseen technical issues and the existing 
technology has significant limitations. Yet, the prototype 
systems based on the undertying infrastructure can help to 
increase awareness of these issues and to set down possible 
solutions. 

In an agent based Service Oriented Architecture approach 
the scenario would be characterized mainly by three actors: 
Service Providers, Business Process Manager and Users as 
shown in figure 7. 


2. SERVICE ORIENTED ARCHITECTURE AND 
ITS ROLE IN ORGANIZATIONAL 
COMPUTATION 

Service Oriented Architecture is & paradigm for organizing 
and utilizing distributed capabilities that may be under the 
control of different domains. SOA is an approach or strategy 
in which applications rely on services available in.a network 
such as the World Wide Web. And It can be considered as a 
way of sharing functions (typically business functions) in a 
widespread and flexible way. In other words we can say, 
SOA is a service-oriented architecture and can be defined as 
a group of services, which communicate with each other. It 
uses services available in a network and promotes loose 
coupling between software components so that they can be 
reused, Applications in SOA are built based on services; 
where service is ап implementation of business 
functionality, and such service can then be consumed by 
clients in different applications or business processes [11]. 
In SOA framework, Service modeling includes [4]: 
1. Service Oriented Enterprise 
2. Service Oriented Architecture 
3. Service Oriented Computing 
SO Enterprise. The Service Oriented Enterprise-(SOE) is a 
new model for architecting software and IT infrastructure. It 
allows & business to view itself from the perspective of its 
customers, suppliers and other trading partners. The business 
value derived from this approach includes cost savings, 
flexibility and the ability to respond more quickly to 
marketplace changes. 
SO Architecture: A service-oriented architecture is 
essentially а collection of services. These services 
communicate with each other. The communication can 
involve either simple data passing or it could involve two or 
more services coordinating some activity. Ѕопте means of 
connecting services to each other is needed. 


SO Computing: Service-oriented computing provides a way 
to create a new architecture that reflects components' 
tendencies toward autonomy and heterogeneity. 
Normally the business process environments which do not 
use the service oriented architecture lack the interaction of 
multiple services at the same time to exchange messages or 
to perform some task. Using the SOA environment the 
following benefits can be drawn [4, 11]: 
е Reuse of services enabled by the decoupling of service 
providers and service consumers 
Structured description of interfaces 
Discoverability of services through the registry 
Incremental deployment and maintenance 
Architectural partitioning that allows the service 
provider to be modified or even replaced without Impact 
to the service consumer 
e Flexibility and agility is facilitated by allowing multiple 
services to be composed quickly into more complex 
services and allowing the process flow between services 
to be configured dynamically 
Service-Oriented Architecture is preferred when there is a 
need for request-reply, real time integration between 
systems, and more than two systems are involved in the 
integration. Similarly Service-Oriented Architecture is also 
preferred when a service being provided is a likely candidate 
for reuse, & service implementation requires no advanced 
knowledge of the service client [10]. 
Many challenges are faced when we adopt SOA. Managing 
services metadata which includes exchange of messages to 
perform tasks, generating millions of messages, managing 
and providing information on how services interact is a 
complicated task. Lack of testing in SOA space, as today 
sophisticated tools are not available that provide testability 
of all headless services (including message and database 
services along with web services), no testing framework is 
available that would provide the visibility required to find 
the fault in the architecture and no provision for appropriate 
levels of security [6]. 
The need of the proposed model / framework is arisen from 
the above challenges. Usually the design framework of SOA 
does not maintain or use agents, but in our proposed model 
we have tried to incorporate the service oriented architecture 
(SOA) based on various business processes agent. Thereby 
making a model, which comprises of agent based service 
oriented architecture. 
In Agent based SOA framework the following architectural 
principles for design and service definition focus on specific 
themes that influence the innate, behavior of a system [4, 6]: 
e Мапу web-services are encapsulated to be used under 
the SOA Architecture. | 
ә Services maintain a relationship that minimizes 
dependencies hence exhibiting the behavior of loose 
coupling. | 
e Services adhere to a communications agreement as 
defined in service contract. 
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e Logic of service is hidden / abstracted from the outside 
world. 

e Logic is divided into services with the intention of 
promoting reuse. 

* Collections of services can be coordinated and 
assembled to form composite services. 

e Services have contro] over the logic encapsulated 
thereby exhibiting service autonomy. 

e High-quality services are preferred than low-quality 
ones for service optimization. 


Figure 1 shows the service components of service oriented 


architecture in business process. 

The functionality of SOA rotates around business processes 
and peckaged as interoperable services. SOA also describes 
IT infrastructure which allows different applications to 
exchange data with one another as they participate in 
business processes. The aim is to have loose coupling of 
services with operating systems, programming languages 
and other technologies. Web Services are the set of 
protocols by which Services can be published, discovered 
and used in a technology neutral, standard form. Services are 
what you connect together using Web Services. A service is 
the endpoint of a connection. Also, a service has some type 
of underlying computer system that supports the connection 
offered. Service is the important concept [8, 13]. Figure 2 
shows the connection between services and service providers 
where as figure 3 shows the mapping of Services between 
Business Partner. 

SOA separates functions into distinct units, or services, and 
makes them available on a network so that they can be 
combined and reused in the business applications. These 
services communicate with each other by passing data from 
one service to another, or by coordinating an activity 
between two or more services. SOA concepts usually built 
upon older concepts of distributed computing and modular 
programming [14]. 


3. SERVICE ORIENTED ARCHITECTURE AND 
WEB TECHNOLOGY 

The technology of Web services is connection technology 
for service-oriented architectures. The service provider 
returns a response message to the service consumer. The 
request and subsequent response connections are defined in 
a way that is understandable to both the service consumer 
and service provider. A service provider can also be a 
service consumer. The term Web Services refers to the 
technologies that allow for making connections. Services are 
what we connect together using Web Services [2, 4]. A 
service is the endpoint of a connection and has some type of 
underlying computer system that supports the connection 
offered. The combination of services - internal and external 
to an organization - makes a service-oriented 
architecture. The relation between organizational services 
and web Technologies is shown in figure 4. 

In general, business entities offer capabilities and act as 
service providers [3, 8]. One who makes use of services is 
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referred to as service consumers. The service description 
allows prospective consumers to decide if the service is 
suitable for their current needs, Although SOA is commonly 
implemented using web services, services can be made 
visible, support interaction, and generate effects through 
other implementation strategies. Web service-based 
architectures and technologies are specific and concrete [15]. 


4. WEB SERVICES AND AGENT BASED 
SERVICE ORIENTED ARCHITECTURE 

In web service-based architectures the service providers can 

register the instance of the service in the registry making it 

available to service consumers. The service consumer тау 
then query the registry in order to retrieve the binding 
information required to access the service. The service 
consumer then invokes the service. The relationship between 

a service provider and consumer is dynamic and established 

at runtime by a binding mechanism. Dynamic binding 

minimizes the dependencies between the service consumer 
and the service provider. Service Oriented Environment is 

based on the following major principals [1, 6]: 

e Service is the important concept. Web Services are the 
set of protocols by which Services can be published, 
discovered and used in a technology neutral, standard 
form. 

e SOA is not just architecture of services seen from a 
technology perspective, but the policies, practices, and 
frameworks by which we ensure the right services are 
provided and consumed. 

e With SOA it is critical to implement processes that 
ensure that there are at least two different and separate 
processes—for provider and consumer. 

e Rather than leaving developers to discover individual 
services and put them into context, the Business Service 
Bus is instead their starting point that guides them to a 
coherent set that has been assembled for their domain. 

The value of SOA is derived from the runtime and 

design/development/configuration activities [2]. The web 

architecture of SOA is shown in figure 5, where as enterprise 

architecture of service model is shown in figure 6. 

The development process gains speed by the reuse of 

services. Dynamic discovery and. binding at runtime 

supports loose coupling leading to more stable and reliable 
applications. Today, agents are being applied in a wide range 
of industrial applications [15]. Most of the technology and 
market research companies, which provide their clients with 
advice about technology's impact on business and 
consumers, agree on the fact that the adoption of a SOA 
paradigm is strategic and should be part of the most forward- 
looking software projects. Agents who require a service 
from another agent enter into a negotiation for that service to 
obtain a mutually acceptable price, time, and degree of 
quality. Successful negotiations result in binding agreements 

between agents [9, 10]. This agent-based approach offers a 

number of advantages over more typical workflow solutions 

to this problem. The proactive nature of the agents means 
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services can be scheduled in a just-in-time fashion (rather 
than pre-specified from the beginning), and the responsive 
nature of the agents means that service exceptions can be 
detected and handled in a flexible manner [5, 12]. 


5. PROPOSED MODEL OF SERVICE ORIENTED 
ARCHITECTURE FOR ORGANIZATIONS 
А service-oriented architecture (SOA) is an application 
topology in which the business logic of the application is 
organized in modules (services) with an identity, purpose 
and access interfaces. Services behave as "black boxes" 
where their internal design is independent of the nature and 
purpose of the requestor [7]. In SOA, data and business logic 
are encapsulated in modular business components with 
documented interfaces. This helps to understand the design 
better and facilitates incremental development and future 
extensions. A SOA application can also be integrated with 
heterogeneous, external legacy and purchased applications 
more easily than a monolithic non-SOA application. 
Applications that have separate business layers are more 
suitable to access a SOA environment [10, 11]. 
The proposed system is based on the emergent and more 
established technologies which we aim at integrating with 
agent technology, the need for SOA in organizations & 
agent-based SOA for business dynamics followed by the 
Business process and the Behavior of system in SOA and the 
architecture related to web services. The proposed system 
consists of a number of specialized agents with diferent 
expertise. Н comprises of the Web agent and 
Communication Service Agent (CSA), Application Interface 
Agent (AIA), Data Adaptation Agent (DAA), Application 
agent and different Business Process and Data Retrieval 
agent and communication agent which are architected in 
order to work together for the optimized working using 
SOA. 
The system architecture would be used in communities 
consisting of different kinds of agents like serviceproviders, 
personal assistants and middle agents (e.g. service brokers, 
user profile managers, workflow managers, etc) and other 
agents like. Communication Service Agent (CSA), 
Application Interface Agent (AIA), Data Adaptation Agent 
(DAA). These autonomous agents should be able to perform 
their tasks in cooperation or in competition with other agents 
and be able to interoperate with external entities (e.g., legacy 
software systems) for achieving their goals (semantic 
matching, service contracting etc.). They should have 
reasoning capabilities and support for dynamic behavior 
modification based on business rules. They should also be 
able to build workflows, compose the external Web services 
and monitor their execution. A distributed management 
should support the complete process. The use case of SOA 
scenario is shown in figure 7. 
The multiple agents that we have used have specific work 
and they work in co ordination with each other. When the 
user surfs on World Wide Web, he uses an agent to contact 
the Service Provider that in tum is using the Services 
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associated with it in order to provide assistance. In our 
proposed model the Service Provider takes help of the 
Business Process Manager whose task is to authenticate for 
services to Service Provider and define the Services for 
Business process. 

We are defining the following agents to comprise services 
and connections between the services. 


CSA (Communication Service Agent) 

Aim: to provide interface between Wob server and the 

Website 

Task (action) of the agent: 

1. establish Communication between the Web server and 
the web site 

2. provides Exchange of Services between the two 

3. transfers the data (request) 

(The above task is accomplished by Communication 

Switching Agent) 

Procedure Sequence: 

1, a web user on selecting a particular site establishes the 
connection with the web server of that site 

2. communication is established between the two by the 
Communication Switching Agent 


AIA (Application Interface Agent) 

Aim: to provide interface between Web server and the 

application 

Task (action) of the agent: 

1. establish Interface between the web server and the 
application 

2. check for adaptive environment (operating system and 
application platform) 

3. pessing of information from one website to the other as 
required with the help of Communication Service Agent 

(The above task is accomplished by Interface Switching 

Agent) 


' Procedure 


1. the web server collects the information being searched 
and the selected site 

2. an interface is established between the web server and 
the selected application through any adaptive 
environment by the Interface Switching Agent 


DAA (Data Adaptation Agent) 

Aim: to help exchange and passing of data between Server 
and the Database 

Task (action) of the agent: 

1. to exchange the available / required data 

2. to help in adding of new data 

3. to modify the existing data 

4. to delete the unwanted data 

(The above task of Updating of data is accomplisbed by 
Deta Exchange Agent) 

Procedure Sequence: 
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1. the selected and required matter is made available of the 
huge repository available for the particular application 
from the data storage 


'2. this data up-dation is performed by Data Exchange 


Agent 
In the SOA configuration of our agent based system the 
user's request is processed with the help of an agent which 
searches for the contents in the web server which in tum 
takes help of the agent to look for the adaption with the 
application being asked for and another agent which 
searches for the data being searched from the database 


storage. 

Along with these agents there are few agents, which is also 

useful for SOA. These are as follows: 

1. WEB AGENTS : these agents act as the interface 
between various requestors and responders on the world 
wide web 

2. APPLICATION AGENTS : these agents refer to various 
independent applications available which can be 
contacted through world wide web with other 
heterogeneous and homogeneous applications 

3. DATA RETRIEVAL AGENTS : these are agents which 
serve as repository of the data a requestor is requesting ` 
for , to an application 

4. COMMUNICATION AGENTS: these are agents who 
help in establishing connection between various agents 
(viz between web agent and application agent, 
application agent and data retrieval agent etc.) 


6. FUTURE WORK AND CONCLUSION 

A framework prototype of the Service Oriented Architecture 

for Business dynamics is currently under development where 

a SOA based model is being designed and developed. The 

implementation result shall be presented in a sequential 

publication. 

Future work under this research will focus on the following 

issues: 

1. Design of a conceptual framework for agent 
based SOA, to provide decisions for the best 
communication between services 

2. Implementation of agents for SOA in distributed 
Environment. 

3. Design and development of agents for SOA 

4. Algorithm best agent based SOA practices 
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Figure 3. Services between Business Partners in SOA 
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Figure 7. SOA Scenarios with Actors 





Figure 5. Web Architecture of SOA 
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ABSTRACT 

This paper presents two models based on pipeline approach 
Jor determining pair-wise sequence alignment of two 
molecular sequences. One of the .models considers a 
variation of Needleman-Wunsch method as а basic 
algorithm and other is based on the use of scoring matrix for 
alignment. The basic purpose of using the pipelines is to 
reduce the time-complexity of alignment significantly. Paper 
also discusses the design & implementation of the basic 
linear version of the algorithms in our software tool by the 
name “Sequence Comparison and Analysis Tool [SCAT] ”. 
Our tool also provides the option of sequence alignment on 
the basis of common grouping like chemical, functional & 
structural etc. The software tool is implemented using Visual 
Basic-6 package with user-friendly windows environment. 


KEYWORDS 
Sequence Alignment, Pipeline, Needleman-Wunsch 
Algorithm, Scoring Matrix etc. 


1. INTRODUCTION 

Sequence comparison can be defined as the problem of 
finding, which parts of the sequences are similar and which 
parts are different [1,4,5]. It is regarded as the building block 
for many other, more complex problems such as multiple 
alignments (the comparison of a group of related sequences) 
and the construction of phylogenetic trees that explain the 
evolutionary relationship among’ species. Sequence 
comparison is actually a well-know problem in computer 
sclence. For the computer scientist, bimolecular sequences 
are just another source of data. Indeed, one that has 
experienced a tremendous growth in interest to the point that 
it has spawned an interdisciplinary field of its own; generally 
know as bioinformatics, computational molecular biology or 
just computational biology [4,5]. As biological databases 
grow in size, faster algorithms and tools are needed [6-15]. 
Our interest is to identify similarities and differences 
between two sequences by comparing them with each other. 
Generally, a measure of how similar they are is also 
desirable. A typical approach to solve this problem is to find 
a good and plausible alignment between the two sequences, 
If two sequences in an alignment share a common ancestor, 
mismatches can be interpreted as point mutations and gaps 
as indels (that is, insertion or deletion mutations) introduced 
in one or both lineages in the time since they diverged from 
one another. The objective is to match identical 
subsequences as far as possible. An alignment can bee seen 
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as a way of transforming one sequence into the other. Once 
the alignment is produced, a score can be assigned to each 
pair of aligned letters, called aligned pair, according to some 
chosen scoring scheme such as PAM and BLOSUM [4,5] 
that take into account physicochemical properties or 
evolutionary knowledge of the sequences being aligned. 
Computational approaches to sequence alignment generally 
fall into two categories: global alignments and local 
alignments. Calculating a global alignment is a form of 
global optimization that "forces" the alignment to span the 
entire length of all query sequences. By contrast, local 
alignments identify regions of similarity within long 
sequences that are often widely divergent overall. Local 
alignments are often preferable, but can be more difficult to 
calculate because of the additional challenge of identifying 
the regions of similarity. 


2. BACKGROUND 

In our proposed method we have applied a multi-Pipeline 

approach to the standard global alignment algorithm referred 

as Needleman-Wunsch method. So let us first understand the 

working principle behind Needleman-Wunsch algorithm [2]. 

It computes the similarity between two sequences A and B 

of lengths m and n, respectively, using & dynamic 

programming approach. Dynamic 

Pogramming is a strategy of building a solution gradually 

using simple recurrences [3]. The key observation for the 

alignment problem is that the similarity between sequences 

А[1..п] and B[i.m] can be computed by taking the 

maximum of the three following values: 

e The similarity of A[1..n ~1] and B[1..m —1] plus the 
score of substituting A[n] for B[m]; 

e The similarity of A[1..n ~1] and B[1..m] plus the score 
of deleting aligning A[n]; 

e The similarity of A[1..n] and B[1..m —1] plus the score 
of inserting B[m]. 

From this observation, the following recurrence can be 

derived: 


match ( A[1.i], B[1.j] ) = match ( A[1.i -1], В.) ~1] ) + 
sub ( Afi], BÛ] ); 

max (match ( A[1..i -1], B[1.j] ) + Del (АШ; 

match ( A[1..i], B[1..j ~1] ) + Ins ( BÛ] ) } 


Where match (A, B) is a function that gives the similarity of 
two sequences A and B, and sub (a, b), Del (c) and Ins (c) 
are scoring functions that give the score of a substitution of 


character ‘a’ for character ‘b’, a deletion of character ‘c’, 
and an insertion of character ‘c’, respectively. 
This recurrence is complete with the following base case: 
match ( A[0], B[0] ) = 0; where A[0] and B[0] are defined as 
empty strings. 
To solve the problem with this recurrence, the algorithm 
generally builds an (n +1) x (m +1) matrix where each M[i, 
j] represents the similarity between sequences A[1.i] and 
B[1.j]. The first row and the first column represent 
alignments of one sequence with spaces. M[0, 0] represents 
the alignment of two empty strings, and is set to zero. All 
other entries are computed with the following formula: 
Мі, j = MIi -1, j 71] + Substitute ( AIT], BÛ] ); // (f 
A[i]- B[j] 
Max( M[i ~1, j] + Del ( Afi] ; M [i,j -1] + Ins ( 

BÛ] } / ¢Ali]<>BG] 
The matrix can be computed either row by row (left to right) 
or column by column (top to bottom). In the end, M[n, m] 
will contain the similarity score of tho two sequences. Since 
there are (mr+1) · (0+1) positions to compute and each take а 
constant amount of work, this algorithm has time complexity 
[3] of O(n’). Clearly, it has also quadratic space complexity 
since it needs to keep the entire matrix in memory. 
Once the matrix has been computed, the actual alignment 
can be retrieved by tracing a path in the matrix from the last 
position to the first. The trace is a simple procedure that 
compares the value at each M[i, j] to the values of its left, 
top and diagonal entries according to the formula given 
above. For instance, if M[i, J] = M [1, j ~1] + Ins ( B[j] ), the 
trace reports an insertion of character B[j] and proceeds to 
entry МП, j ~1]. Alternatively, pointers can be saved on each 
entry during the computation of the matrix so that this 
evaluation step can be avoided at the cost of more memory 
usage. Since the path can be as long as O(m + n), this 
procedure has linear time complexity. Note that sometimes 
more than one path can be traversed and, as a result, multiple 
high-scoring alignments can be produced. In the matrix of 
Figure 1, two optimal alignments can be retrieved 
A*"ACAAGACAG-CG T 

| Ii | IL IEI 
B=AGAACA- AGGCGT 
It is often useful to see the dynamic programming solution 
for the sequence alignment: problem as a directed weighted 
graph with (n +1) x (m +1) nodes representing each entry (i, 
j) of the matrix, and having the following edges: 
• REL UAM MM E 


e (0—1,J), (, j)) with weight equals to Del (АШ); 

e ((,3—1), (,3)) with weight equals to Ins ( BIJ] ); 

A path from node (0, 0) to (n, m) in the alignment graph 
corresponds to an alignment between the two sequences, and 
the problem of retrieving an optimal alignment із converted 
to the problem of finding a path in the graph with highest 
weight. 

Needleman-Wunsch method works fine for short sequences 
but for longer sequences the performance of the algorithm 
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degrades quite considerably due to its O(n?) behavior. Our 
proposed method improves the time complexity to O(n) 
which is a significant improvement. 


3. PROPOSED METHODOLOGY 
Problem 3.1: Sequence Alignment of two molecular 
sequences 
In recent years use of parallel algorithms and methods 
[18,19,20] has gained a lot of attention by researchers 
perticularly in the area of sequence comparison related 
problems in molecular biology. We have proposed a multi- 
Pipeline strategy with two-stages per pipeline for alignment 
of two sequences. A delay of one unit time is inserted in 
each of the successive pipelines as each next pipeline is data 
dependent on its previous pipeline and thus delay enables the 
availability of data for each successive pipeline. Thus 
pipelines do not work concurrently with each other; rather 
they follow a sequential order while execution i.e. with the 
start of initial clock pulse pipeline-1 comes into play; at the 
second clock pulse pipeline-2 takes off and in similar 
fashion each of the other pipelines starts in successive clock 
pulses following a delay of one unit every time. 
In spite of this forced delay of one unit in each successive 
pipeline time-complexity of the algorithm improves 
significantly. The computation involved in the two stages 
employed in each pipeline is given below with & general 
assumption that each stage consumes one unit of cycle time. 
The time complexity of the general algorithm given as below 
will take O(m*n ) which becomes quite significant as the 
size of the sequences grows and thus is not feasible at all. 
For i-1tom 
For j= 1 ton 
If A[1]-B[j] then 
о ИВАН АВО, 


M[ij]>Max{M[i-1 j]+Del[A@], M[ij-1]+Ins[BG)} 
Consider two short 
ACAAG--—~-———length 5 
AGAAC-——————Length 5 
We need to compute M[5,5] 
M[0,j] and M[i,0] are initialized 


Figure 3 shows the result of applying the general algorithm 
which in this case will take 25 units of time to align two 
sequences each of length 5. Figure 4 shows how the matrix 
of order O (m*n) is filled by applying the proposed method 
allowing & delay of one unit at the beginning of each 
pipeline. Use of five pipelines has been depicted. Clearly 
there is significant improvement in the time complexity 
where it only takes 10 units of cycle-time to complete the 
process. In general the time complexity can be given as 
O(c*n) where 'c' is a constant term which is a very 
significant improvement over O(m*n) 

Figure 5 given above shows the general architecture of the 
proposed pipeline-model with N functional units including 
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Fetch units [Е], Decoders [D,], Execution unit with Adders 
[A], Comparators [C] and Storage units [S] < 


Problem 2: Determining the longest Continuous 
Subsequence with no gaps in given two sequences, 

Some times we are more interested in finding the longest 
conserved region from two given molecular sequences. The 
proposed model based on pipeline approach is an attempt to 
solve the above stated problem. Again we propose a two 
stage multi-Pipeline model. Input to the pipeline is two DNA 
sequences which are converted in all the six-reading frames 
into corresponding protein sequences and thus resulting in 
six pairs of amino acid sequences. For each of the sequence 
pairs; matrix of order m*n is constructed based on some 
scoring matrix where m & n are the lengths of the sequences 
respectively. 

Here we have proposed the use of six-plpelines each with 
two stages where all the six pairs of obtained sequences are 
input to one pipeline. All the six-pairs of sequences can be 
aligned concurrently with each other and thus improving the 
time complexity significantly. Figure 9 shows how the 
pipeline works for the given prepared matrix in figure 7. 
Traditional algorithms would have taken O(6*n*m) time in 
the worst case and even the best algorithm would have taken 
O(6*n) time-complexity. However our strategy provides a 
better time complexity of O(n) in the worst-case with some 
overhead on the required resources in the form of multiple 
functional units. This is indeed a very significant 
improvement. Method does require the existence of multiple 
functional units like loaders, adders etc. 

All the six pairs of obtained sequences can be mapped on to 
the six pipelines simultaneously as shown in figure 9 (here 
we have not shown the six pairs of obtained sequences 
converted in all the six reading frames). Scoring matrices are 
constructed for each pair of all the six sequences where 
values in the matrix are identified by the taken variables a, , 
b, Cy d, e; and f, Each of the pipelines has global variables 
by the names S, T, О, №, X, and Z; respectively that 
computes the sum starting from the residues positions а, to 
ag for each of the six sequences. Then we look for the 
maximum of the obtained sum values in each of the 
sequence pairs. For example in the above taken sample 
sequence the sum S2*S2-a; + 85; + ag + a54=10+7+6+8=31 
is the maximum sum among the sum values S1, S2, S3, S4, 
and S5. 

The best alignment corresponding ono of the obtained pairs 
of sequences (one of the six reading frames) is аз азо 84, 854 
i.e. at pipeline 1. 

DALTN 

MEN 

TDALT 

Where aligned characters are marked by pipe symbols. 
Similarly the alignment for the other peirs of sequences can 
be obtained simultaneously reducing the time complexity of 
the algorithm significantly. 
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X, and Z, respectively that computes the sum starting from 
the residues positions a, to a, for each of the six sequences. 
Then we look for the maximum of the obtained sum values 
in each of the sequence pairs.] 


4. IMPLEMENTATION 

Here we have shown the screen formats of the 
implementation ‘of the linear versions of the presented 
algorithms in our tool named as: ‘Sequence Comparison and 
Analysis Tool. The tool actually provides the solution to 
number of sequence comparison problems prevalent in 
molecular biology. Figure 10 show the interface that 
captures all the input details for aligning two sequences. As 
it can be seen sequence alignment can be done in four ways 
i.e. between nucleotide-to-nucleotide, nucleotide to proteins, 
Proteins to proteins and proteins to nucleotide. For a given 
input DNA sequence, one can not only consider it's upper & 
lower strands but also the reverse strand in either case. 
Alignment can be done for all the sequences obtained in six 
reading frames. Both the local and global alignments are 
possible, One can also provide the values for residue match 
mismatch and gap value. A number of algorithms including 
standard and self developed {algorithms are a part of our 
research papers already published in various journals and 
conferences [21-27] ) are implemented in our tool 
(description of these algorithms are beyond the scope of this 
paper) One can align the sequences based on various 
scoring matrices also such as PAM & BLOSUM . Four types 
of alignment have been considered i.e. exact alignment, gap 
alignment, alignment based on groupings and ends-free 
alignment, Result window is quite user-friendly showing the 
alignment score and % of matched residues. 


5, CONCLUSIONS AND FUTURE WORK 

The proposed models can be easily implemented on parallel 
computers with multiple functional pipelines and will 
improve the time complexity of aligning the sequences. 
Assumption of multiple pipelines and functional unit 
improves the time complexity of the standard algorithms 
quite considerably from O(n?) to O(n). The most-significant 
part of the algorithm is its ability to align more than one pair 
of sequences simultaneously with no additional overhead. 
Use of data-flow computers can be quite useful for the 
discussed sequence alignment problem and can provide even 
a better solution for sequence comparison types of jobs. we 
hope to come up with a better solution in our next paper by 
using the strategy data flow computing. 
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Figure 1: Standard dynamic programming matrix for the 
global alignment of 
A-ACAAGACAGCGT and 
B=AGAACAAGGCGT with paths to retrieve 
optimal alignments indicated with arrows. 


Solving Sequence Alignment Problem Using Pipeline Approach 


WE o е em 


TENERENT UTER 








Figure 3. Resuit of alignment [Algorithm will take 25(5*5) 
pd 






E oa m RR ses rl SE 

re per ee c | Cone | i gd ce n PP 6 

Figure 4. Result of the proposed model with one unit of 
delay in each successive e 


Te 


Н heras zi 


Н 
Н rop 
" LT 
прече s = 
eee: eS BN ту apo - FORTE 
eee anaa ae s لے‎ aes MED Ei ER 


Pare. ote» 


iz. 


Padi 


Pers 


Figure 5. General architecture of the p „ШЕ 


Pipeline [F: Fetch unit, D: Decode Unit, 
C: Comparator, A: adders, S: store units] 


Copy Right © BUIT — 2009 


a 


Poe 
acc ыы 


-7[а41] б[аз] 


re. ie 


Figure 7. Alignment scores using BLOSUM-80 















de 
* 


Ka n Tu 
LE Ф. dn об H on J 





осие wit! multiple Function: al Uni 
Pipeline, F Fetch Unit, D: Decode unit, A: adder, 
L: Loader, C: aos Dad storage unit) 





aS Sep SET TUTTI‏ ی 
molecular sequences 27‏ 


dene i ch E gem 


AE 


сокса EI 
ЕСЕ 


д сы crum ue Au ue 
Notum puc. ies 
o суо 


a E 
э, 


Sx дет A 
ae raced eee ie, 


m 


a x ELE. 
фа де кызаш: 

: ا ا‎ 
IE اا مآ‎ TP кү FETE 
TENEI ES Se дин, 

pe Peed 2 Ne - i 


ris 


Б mm 
ABS 
| с d RET 
15 ME pe x RE iori 
се ee) 


nn Working == 
pipelines has global variables by the names S, 











иссе Ral de, ir 
a Showing alignment of the two input 
sequences with alignment score. 


Copy Right © BIJIT – 2009 


BVICAM'’s International Journal of Information Technology 


BVICAM’s Intemational Journal of Information Technology 


Bharati Vidyapeeth's Institute of Computer Applications and Management, New Delhi 


Distribution Based Change-Point Problem With Two Types of Imperfect Debugging in 


Software Reliability 


Р. К. Kapur’, Sameer Anand’ and V. B. Singh? 
"Department of Operational Research, University of Delhi, India 
75. S. College of Business Studies, University of Delhi, India 
Delhi College of Arts & Commerce, University of Delhi, India 


E-Mail: 'pkkapurl @gmail.com, *sanand_or@yahoo.com, *singh_vb@rediffmail.com 


ABSTRACT 

Software testing із an important phase of software 
development life cycle. It controls the quality of software 
product. Due to the complexity of software system and 
incomplete understanding of software, the testing team may 
not be able to remove/correct the fault perfectly on 
observation/detection of a failure and the original fault may 
remain resulting in a phenomenon known as imperfect 
debugging, or get replaced by another fault causing fault 
generation. In case of imperfect debugging, the fault content 
of the software remains same while in case of fault 
generation, the fault content increases as the testing 
progresses and removal/correction results in introduction of 
new faults while removing/correcting old ones. During 
software testing fault detection /correction rate may not be 
same throughout the whole testing process, but it may 
change at any time moment. In the literature various 
sqftware reliability models have been proposed 
Incorporating change-point concept. In this paper we 
propose a distribution based change-point problem with two 
types of imperfect debugging in software reliability. The 
models developed have been validated and verified using 
real data sets. Estimated Parameters and comparison 
criteria results have also been presented 


KEYWORDS 
Non-homogenous Poisson process, software reliability 
growth model, hazard rate, imperfect debugging. 


NOTATION 
m(t): the mean value function or the expected number of 
faults detected or removed by time £ 


a(t): total fault content of software dependent on time. 

p : the probability of fault removal on a failure (1.0., 
the probability of perfect debugging). 

a: the rate at which the faults/errors may be introduced 
during the debugging process. 

b :  faultremoval/correction rate. 

A): intensity function for NHPP models or fault 


detection rate per unit time. 

F(t): distribution functions for fault removal/correction 
times. 

ЛО : density functions for fault removal/correction 
times. 
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z(t): hazard rate function. 
В : leaming parameter іп logistic function. 


1. INTRODUCTION 

Computer software is embedded in systems of all kinds: : 
transportation, medical, telecommunications, military, 
industrial processes, entertainment, office products. the list 
is almost endless. Software is virtually inescapable in a 
modern world. And as we move into the twenty-first 
century, it will become the driver for new advances in 
everything from elementary education to genetic 
engineering. Software development consists of different 
phases: requirement analysis, design, coding, testing, 
implementation and maintenance called SDLC. Research has 
been conducted in software reliability engineering over the 
past three decades and many software reliability growth 
models (SRGM) have been The Software 
Reliability Growth Model (SRGM) is the tool, which can be 
used to evaluate the software quantitatively, develop test 
status, schedule status and monitor the changes in reliability 


performance. | 
Research has been conducted in software reliability 
engineering over the past three decades and many software 
reliability growth models (SRGM) have been proposed. The 
pioneering attempt in non-homogenous Poisson process 
based on SRGM was made by Goel and Okumoto (G-O) [1]. 
The model describes the failure observation phenomenon by 
an exponential curve. There are also SRGM that describe 
either S-shaped curves or a mixture of exponential and S- 
shaped curves (flexible). Some of the important 
contributions of these type of models are due to Yamada et 
al. [27], Ohba [18], Bittanti et al. [3], Kapur and Garg [16], 
Kapur et al. [17], Pham [24] etc. 
In most of the models discussed above it is assumed that 
whenever an attempt is made to remove a fault, it is removed 
with certainty i.c. a case of perfect debugging. But the 
debugging activity is not always perfect because of number 
of factors like tester’s skill/expertise etc. In practical 
software development scenario, the number of failures 
observed/detected may not be necessarily same as the 
number of errors removed/corrected. Kapur and Garg [16] 
have discussed in their error removal phenomenon model 
that as testing grows and testing team gains experience, 
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additional numbers of faults are removed without them 
causing any failure, 

The testing team, however, may not be able to 
remove/correct fault perfectly on observation/detection of a 
fajlure and the original fault may remain leading to a 
phenomenon known as imperfect debugging, or replaced by 
another fault resulting in fault generation. In case of 
imperfect debugging the fault content of the software is not 
changed, but because of incomplete understanding of the 
software, the original detected fault is not removed perfectly. 
But in case of fault generation, the total fault content 
increases as the testing progresses because new faults are 
introduced in the system while removing the old original 
faults. 

Model due to Obha and Chou [18] is an fault generation 
model- applied on О-О model and has been also named as 
Imperfect debugging model. Kapur and Garg [22] 
introduced the imperfect debugging in G-O model. They 
assumed that the FDR per remaining faults is reduced due to 
imperfect debugging. Thus the number of failures 
observed/detected by time infinity is more than the initial 
fault content. Although these two models describe the 
imperfect debugging phenomenon yet the software 
reliability growth curve of these models is always 
exponential. Moreover, they assume that the probability of 
imperfect debugging is independent of the testing time. 
Thus, they ignore the role of the learning process during the 
testing phase by not accounting for the experience gained 
with the progress of software testing. Pham [24] developed 
an SRGM for multiple failure types incorporating fault 
generation. Zhang сї al. [26] proposed a testing efficiency 
model which includes both imperfect debugging and fault 
generation, modeling it on the number of failures 
experienced/observed/detected, however both imperfect 
“debugging and fault generation are actually seen during fault 
removal/correction. Recently, Kapur et al. [12] proposed a 
flexible SRGM with imperfect debugging and fault 
generation using a logistic function for fault detection rate 
which reflects the efficiency of the testing/removal team. 

We execute the program in specific environment and 
improve its quality by detecting and correcting faults. Many 
SRGM assume that, during the fault detection process, each 
failure caused by a fault occurs independently and randomly 
in time according to the same distribution Musa et al. [21]. 
But the failure distribution can be affected by many factors 
such as running environment, testing strategy, defect density 
and resource allocation. On the other hand, in practice, if we 
want to detect more faults for a short period of time, we may 
introduce new techniques or tools that are not yet used, or 
bring in consultants to make a radical software risk analysis. 
In addition, there are newly proposed automated testing tools 
for increasing test coverage and can be used to replace 
traditional manual software testing regularly. The benefits to 
software developers/testers include increased software 
quality, reduced testing costs, improved release time to 
market, repeatable test steps, and improved testing 
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productivity. These technologies can make software testing 
and correction easier, detect more bugs, save more time, and 
reduce much expense. Altogether, we wish that the 
consultants, new automated test tools or techniques could 
greatly help us in detecting additional faults that are difficult 
to find during regular testing and usage, in identifying and 
correcting faults most cost effectively and in assisting clients 
to improve their software development process. Thus, the 
fault detection rate may not be smooth and can be changed at 
some time moment * called change-point. Many researchers 
have incorporated change point in software reliability 
growth modeling. Firstly Zhao [28] incorporated change- 
point in software and hardware reliability. Huang et al. [7] 
used change-point in software reliability growth modeling 
with testing effort functions. The imperfect debugging with 
change-point has been introduced in software reliability 
growth modeling by Shyur [25]. Kapur et al. [9,14] 
introduced various testing effort functions and testing effort 
control with change-point in software reliability growth 
modeling. Goswami et al.[6] and Kapur et al.[15] proposed 
a software reliability growth model for errors of different 
severity using change-point. The multiple change-points in 
software reliability growth modeling for fielded software has 
been proposed by Kapur et al. [11]. Later on SRGM based 
on stochastic differential equations incorporating change- 
point concept has been proposed by Kapur et al. [10]. 


2. BASIC ASSUMPTION 

The NHPP models are based on the assumption that the 
software system is subject to failures at random times caused 
by manifestation of remaining faults in the system. Hence 
NHPP are used to describe the failure phenomenon during 


{N(D,t 20} 


the testing phase. The counting process of an 


NHPP process is given as follows. 


Pr{N (1) =k} = (O) no, k= 012... 


(1) 
and 


1 
m(t)= | А(х)& 
d (2) 
The intensity function A(x) (or the mean value function m(t)) 
is the basic building block of all the NHPP models existing 
in the software reliability engineering literature. 
The proposed models are based upon the following basic 
assumptions: 
1. Failure fault removal phenomenon is modeled by 
NHPP. 
2. Software is subject to failures during execution caused 
by faults remaining in the software. 
3. Failure rate is equally affected by all the faults 
remaining in the software. 
4. When a software failure occurs, an instantaneous repair 
effort starts and the following may occur: 
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a) Fault content is reduced by one with 
probability p 

(b) Fault content remains unchanged with 
probability /-p. 

S. During the fault removal process, whether the fault is 
removed successfully or not, new faults are generated 
with a constant probability æ. 

6. Fault detection / removal rate may change at any time 
moment. 

Assumption 4 and 5 captures the effect of imperfect 

debugging and fault generation respectively. 


3. MODEL DEVELOPMENT 

In this section, we formulate distribution based software 
reliability growth models incorporating change-point and 
two types of imperfect debugging. Since the faults in the 
software systems are detected and eliminated during the 
testing phase, the number of faults remaining in the software 
system gradually decreases as the testing procedure goes on. 
Thus under the common assumptions for software reliability 
growth modeling, we consider the following linear 
differential equation. 


т 5()(a-m(0) в) 


Where b(t) is a fault detection rate per remaining faults at 
testing time t. Here we consider the fault detection rate as 
hazard rate дї), initial fault is not the constant but the 
function of time and incorporating the imperfect debugging. 
So the above equation can be written as 


al). syp(s(r)-m(r) 


We assume that faults can be introduced during the 
debugging phase with a constant fault introduction rateg . 
Therefore, the fault content rate function, a(t), is a linear 
function of the expected number of faults detected by time £ 


That is a(t) =а+ат(;), above equation becomes 


r0. 2(t)p(a+am(t)-m(s)) (4) 


In the proposed model we assume that the hazard 
(t) 

1-F(t)’ 
probability of perfect debugging p, may be changed at somo 
time moment 7 called change point. 

After incorporating change-point, we get the following form 
of fault detection /removal, probability of perfect debugging 
and fault generation rate. 


the fault introduction rate @ and 





rate z(t) = 
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LO 
| i-R() fort&c 
z(t)= f. (5) 
ТЕ (0) fort»t 


Where Fi, fi and Ё, fare the distributions, density 
functions before and after change point respectively. 


The equation (5) can be rewritten as 
Probability of perfect debugging rate will be 
P. fortsc 
-| | (6) 
Р, fort»t 
and fault content rate 


» at+am(t) fortst 
ШЕ а+ат(т)+а,(т(ї}-т(т)) fort >r 
The Equation (5) can be rewritten as 
ЛО) , AO 
z(t)= E EEG D 1 ED т) 
Using unit step function give by 
U(x) = | if x<0 
1 if х>1 
Similarly equation (6) & (7) can also be rewritten as 
p=pU(t-t)+pU(t-1) 
&a(t)-a*am(U(r-1) (aam 2) +a, (mit) m) U(t-7) 
Now using equation (5), (6) and (7), the equation (4) can be 
rewritten as, 
a) i m^ n(atam(i)-m)) frist 


d 
7 пона) eam) 8) mt) fri»r 
Aner solving павее entons: ec get Uie howls 
solutions 


®) 


= 0-80] frist 
jet e 


SRGM-1 

The following exponential distribution function із used to 
model SRGM-1: 

LetT-exp(b)  /fortsr 
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ie F(t)=1-exp(-bt) forst (10) 
and 
Let T-exp(b))  fort»r 


F (f) =1-ехр(-Ы) fort>t (11) 


Substituting the value of F (t)and F (f) from Equation 
(10) and (11) into Equation (9), we get: 


{Суп ==С®л@-а)!)] rs: 
mli) = Tay (-һд(1-а)т-Һд(1-а,)(-т))] 09 
СИ fort>r 


(12) 
The above model can be reduced to the model given by 
Shyur [25] if we consider the perfect debugging and no fault 
generation. 
SRGM-2 
Let F(t) be a two-stage Erlangian distribution function i.e. , 
T~ Erlang-2(b;) fortst 


ie. F(t) =1-(1+5t)exp(-bt) fort s7 (13) 
And 


T~ Erlang-2(b,) Jort>r 

le. F(0 =1-(1+bht)exp(-bt) fort>r (14) 
Substituting the value of F (t)and F,(f) from Equation 
(13) and (14) into Equation (9), we get: 


а Alta) tsr 
gag eerta | fis 


| Ha) 
м) у оа (a)n (а-а) 


а 
Ж tor 

(Fa) т) 

(15) 
The above model can be reduced to the model given by 
Archana [2] if we consider the perfect debugging and no 
fault generation. 
SRGM-3 
Let F(t) be a logistic distribution function i.e., 


T~ logistic distribution (bı, 5 ) for!sr 
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| | __(i-exp(-ds) jortst 
ie. F(r) = [+A exp(-r) (16) 


And 
T logistic distribution (b; 5, ) fort st 


1- t 
le. F(= E r ( (17) 
(1+ A exp(- b,t)) 
Substituting the value of А. (Г) амі № (Г) from Equation 
(16) and (17) into Equation (9), we get: 


Р u (Fa) 

| (Fal Pea ед-ъа(1-а)) jr Ostsr 
4 1+4 es ш 

"lag aia) К E 


җ{-Ад(1-а)г-®ру(1-ау{\г-т)) 
(а-а) m 
0-а) т) r 1 
(18) 
For further simpli the estimation procedure we may 


аззшпоа = а, =а and p,=p,=p. 


4 MODEL VALIDATION, COMPARISON 
CRITERIA AND DATA ANALYSES 
Model Validation 


To illustrate the estimation procedure and application of the 
SRGM (existing as well as proposed) we have carried out 
the data analysis of real software data set. The parameters of 
the models have been estimated using statistical package 
SPSS and the change-point of the data sets have been judged 
by using change-point analyzer. 

Data set 1(DS-1) 

The first data set (DS-1) had been collected during 35 
months of testing a radar system of size 124 KLOC and 
1301 faults were detected during testing. This data is cited 
from Brooks and Motley [4]. The change-point for this data 
set is 17? month, 

Data set 2(DS-2) 

The second data set (DS-2) had been collected during 19 
weeks of testing a real time command and control system 
and 328 faults were detected during testing. This data is 
E da from Ohba [19]. The change-point for this data set is 

week. 


5. COMPARISON CRITERIA FOR SRGM 

The performance of SRGM are judged by their ability to fit 
the past software fault data (goodness of fit) and predicting 
the future behavior of the fault. 
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Goodness of Fit criteria 

The term goodness of fit is used In two different contexts. In 
one context, It denotes the question if a sample of data came 
from a population with a specific distribution. In another 
context, it denotes the question of "How good does a 
mathematical model (for example a linear regression model) 
fit to the data"? 

The Mean Square -Error (MSE): 

The model under comparison is used to simulate the fault 
data, the difference between the expected values, /H(Ij) and 
the observed data у, is measured by MSE as follows. 


MSE = 20: =): 


but 
where & is the number of observations. The lower MSE 
indicates less fitting error, thus better goodness of fit [17]. 
Coefficient of Multiple Determination (R2): 
We define this coefficient as the ratio of the sum of squares 
resulting from the trend model to that from constant model 
subtracted from 1. 


ie. R? = п. 731908155 


corrected SS 


R measures the percentage of the total variation about the 
mean accounted for the fitted curve. It ranges in value from 


For DS-2 

The parameter estimation and comparison criteria results for 
DS-2 of all the models under consideration can be viewed . 
through Table П(а) and II(b) . It is clear from the table that 
the value of R? for SRGM-3 is higher and value of MSE is 
lower in comparison with other models and provides better 
goodness of fit for DS-2. 


mas Ts тә TA TA TT 
ee ж 
2 
pom 
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Table (а). Model Parameter Estimation Results (DS-T) 















Table I(b). Model Comparison Results (DS-1) 
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0 to 1. Small values indicate that the model does not fit the 







i |SRGM-1 [388 | 202 | 222 |- |- [зә |368. 
data well. The larger R^ the better he model explains de — [Srema [467 |602 | 56 |. T= L156 [001 






variation in the data [17]. 
Bias 





The difference between the observation and prediction of number 
of failures at any Instant of time i is known as РЕ, (prediction 
error). The average of PEs is known as bias. Lower the value of 
Bias better is the goodness of fit [8]. 

Variation 

The standard deviation of prediction error is known as 
variation. 

Variation = A Ууу 1 (PE, — Bias 


Lower the value of Variation better is the goodness of fit [8]. 
Root Mean Square Prediction Error 

It is а measure of closeness with which a model predicts the 
observation. 

RMSPE = (Blas? + Variation? 

Lower the value of Root Mean Square Prediction Error 
better is the goodness of fit [8]. 

Data Analyses 

For DS-1 

The parameter estimation and comparison criteria results for 
DS-1 of all the models under consideration can be viewed 
through Table Қа) and I(b). It is clear from the table that the 
value of R? for SRGM-3 is higher and value of MSE is 
lower in comparison with other models and provides better 
goodness of fit for DS-1. 
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| SRGM-3 | 362 |.352 | 281 |5 |3 — 567 | .063 | 


Table II(a). Model Parameter Estimation Results (05-2) 





Table H(b). Model Comparison Results (05-2) 


6. GOODNESS OF FIT CURVES 


For DS-1 
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Goodness of fit curves 











7. CONCLUSION 

In this paper we have developed a distribution based change- 
point problem with two types of imperfect debugging in 
software reliability. With this approach, we can derive 
existing models and propose new model. All these models 
have been validated and verified using real data sets. 
Parameter estimates, comparison results and goodness of fit 
curves have also been presented. 


8. FUTURE SCOPE 

In future, we will try to develop more models in the same 
line by using Erlang normal, weibull and gamma distribution 
functions. Models can be extended for multiple change- 
points problem.. 
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ABSTRACT 

In modern society home and office automation has become 
increasingly important, providing ways to interconnect 
various home appliances. This interconnection results in 
faster transfer of information within home/offices leading to 
better home management and improved user experience. 
Home Automation, in essence, is а. technology that 
Integrates various electrical systems of a home to provide 
enhanced comfort and security Users are granted 
convenient and complete control over all the electrical home 
appliances and they are relieved from the tasks that 
previously required manual control. This paper tracks the 
development of home automation technology over the last 
two decades. Various home automation technologies have 
been explained briefly, giving a chronological account of the 
evolution of one of the most talked about technologies of 
recent times. ' 

KEYWORDS 

Home Automation Network, Wireless Control, Internet 
based Control 


1. INTRODUCTION 


At the advent of 1990s the average house started to have ` 


interaction with many electronic devices. There were regular 
electric appliances such as refrigerator, electronic appliances 
such as ‘television, communication appliances such as 
telephone, and information appliances, such as computer. 
The functioning of all these appliances required dedicated 
wiring system so a normal residential environment had 
various wiring systems including! power wiring, telephone 
wiring, ‘and cable TV wiring, ı Some homes also had 
additional ‘wiring for home security and PC local area 
network etc. All these systems used different types of 
communication media and carried different types of signals 
completely independent of each other. At the same time due 
to great advancements in IC technology the computing costs 
experienced a sharp decline and miniaturization process 
gained momentum making dedicated microprocessor a 
comimon part of home appliances which resulted in 
enhanced intelligence level of home appliances, 

But this intelligence had not been utilized to its true potential 
as these appliances operated in complete isolation from each 
other. Under this scenario the need of a unified “home 
network" was felt keeping in mind various advantages it will 
offer such as (1) ease of use convenience, as an appliance can 
be controlled from different locations (1) sharing of 
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information, and (iii) minimum wiring confusion and low 
cost [1]. 

The working principle of an automated home is explained In 
section 2 of the paper. Some of the early developments in 
the field of home automation technology are detailed in 
section 3. Section 4 is focused on the recent developments in 
home automation. As the home automation technology is 
growing there are serious concerns arising about it’s 
security. Section 5 of the paper is dedicated to the work 
done for enhancing security of the automated home. 


2. WORKING OF AN AUTOMATED HOME 

The key to control of appliances, in an automated home lies, 
in the ability of the products to communicate. The nature of 
these devices in a home network is very similar to that of 
other networks such as a computer network. Each switch or 
module has a unique “address”. When a control signal is 
broadcast through the network, all of the modules in the 
network can hear the commands, but only those to which the 
signal is addressed will respond to it. 

The majority of commands in conventional homes are 
passed on to the device in question through the use of a 
physically operated switch. Generally pressing a switch or 
turning a dial directly alters the supply of electricity to a 
device. The switch or knob opens or closes an electrical 
connection or varies the resistance of that connection. Fig.1 
illustrates this using the example of a typical lighting circuit. 
The lamp in the circuit is linked to a separate switch that is 
able to interrupt the flow of electricity to the light fittings. 

In an automated home the switch takes on a different 
function. Rather than regulating the flow of electricity, the 
switch merely sends a signal to a communication network, 
called a bus system, informing the network of the new 
position of the switch as shown in Fig. 2. A controller fitted 
to a single light fitting, or a number of light fittings, receives 
this signal, recognizes that the message is intended for it and 
responds, in this example by turning on the light. Therefore 
the regulation of electrical flow takes place at the controller 
rather than at the switch. If the bus system connects more 
than just lights, it is possible to radically change the way the ` 
home is controlled. The switch is no longer directly related 
to any particular device so it can operate any device on the 
network that has been told to respond to the signal from the 
switch [3]. So multiple lights, possibly in different rooms, 
could be controlled and even dimmed to different levels as 
illustrated in Fig. 3. In addition, it can also be seen from 


fig.3 that a single switch may be used to control various 
home appliances. 


3. EARLY DEVELOPMENTS ' 

Sensing the advantages associated with home automation 
network research and development projects started on a 
large scale around the world but the absence of a standard 
for the networking of home appliances appeared as the major 
roadblock which was removed in 1992 with the development 
of Consumer Electronic Bus or CEBus by the Electronic 
Industries Association of America [2]. _. 

The CEBus standard includes specifications for a layered 
network architecture based on the Open Systems 
Interconnection model, with network layer protocols for the 
Physical, Data link, Network, and Application layers. The 
main advantage of this standard is that the Physical Layer 
supports six different transmission media, namely twisted 
pair, coaxial cable, power line, infrared, radio frequency, and 
fiber optics. Thus the best physical medium for a given 
application can be selected. Cross & Douligeris [4, 5] 
proposed that fiber optics may be the best medium for home 
automation network because with fiber, the capabilities of 
the home automation system can be expanded to include 
many more functions, leading to complete home integration. 
They observed that although CEBus includes fiber optics as 
one of the physical media but it does not specify the 
configuration of the fiber optic network. Therefore they 
designed a fiber optics based home automation which 
offered various advantages such as (i) increased bandwidth, 
(Н) immunity to electromagnetic noise, (iif) ease of 
installation, and (iv) safety from electric shock hazards. As 
the designed network also had some drawbacks such as its 
higher cost and that optical fiber cannot carry direct current, 
so an alternate source of energy was required. 

During early part of 1990s, the consumer electronics devices 
evolved into digital format, therefore the need was felt to 
interconnect these home appliances through digital links to 
preserve the fidelity of information transmitted. Chen [6] 
proposed a home automation network with the above stated 
purpose. Apart from digital link the main feature of the 
proposed home automation network was the Digital Access 
system which allowed the home network to communicate 
with the outside world also. Chen advocated the use of IEEE 
1394 for the proposed network as it can handle both data and 
isochronous traffic well at a data rate above 100 Mbps. 
Untill 1993, the home automation networks developed 
employed guided or wired media for interconnection of 
appliances. However Fujieda [7] feit that for achieving 
complete marketability, home networks should be easily 
installed not only to newly built houses but also to existing 
houses, So it would be desirable to build up reese 
without any extra wiring. Therefore he advocated the use o 
wireless media for home networking and called tbe network 
as wireless home networks. For the wireless home network 


ified low power (SLP) 
he pro the use of 400 MHz specified 
ice developed a low power and small size RF 
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section of SLP band and communication protocol and 
demonstrated the proposed wireless network to be viable. 
Using the prototype he also implemented a couple of 
application systems, a maintenance system for instantaneous 
gas water heaters and a health promotion system with 
chronic disease prevention. 


4. RECENT DEVELOPMENTS 
Early generation appliances typically relied on a hard wired 
connection to a desktop computer in order to communicate 
with the outside world but during the past decade great 
&dvancements in Internet, Mobile telephony and TCP/IP 
technologies have resulted into many appliances having their 
own inbuilt communication transceivers; Infra Red, 802.1 1b, 
Bluetooth and GSM/GPRAS. Also development of new 
physical layer technologies has resulted into reliable transfer 
of data at а much faster rate. All these developments have 
changed the face of home automation technology also. 
A critical analysis of some of the recent research and 
development efforts in the field of home automation is 
presented in this section. 
In 2003 Hiroshi Kanma and others [8] observed that 
although the rapid spread of Internet at home may provide a 
convenient way of implementing a home network and its 
control, however there were certain hindrances to be 
removed to make home automation common such as (i) 
initial cost of introducing the home network system and the 
contro! terminal equipment, (ii) difficulty in simultaneously 
replacing all home appliances for networking , and (iii) the 
lack of mobility in the contro! terminal. To solve all these 
problems Kanma proposed the use of Bluetooth as 
communication medium and a cellular phone as the terminal 
equipment. A communication adapter was attached to the 
home appliances in order to provide a Bluetooth 
communication functionality which eliminated the need of 
purchasing new appliances for the home network. A 
simplified overview of the proposed network is shown in 
fig.4. In addition they postulated that the cellular phone will 
provide short start up time and its ability to access internet 
can provide certain other useful functionalities and services, 
Hardware and software for these adapters, Java applications 
running on the cellular phone and the interface software 
between the Java applications and the adapter were made for 
the prototype. Further developments in this direction have 
been done on various Bluetooth kits/boards produced by 
Man n Tel, Korea [9,10]. 
At the same time Tajika and others [11] articulated that the 
home network technology was focused -primarily on how 
data and access protocols on the Internet can be utilized in 
the home network by converting them ps pupa 
protocols through a home gateway. However they fe f 
apart from control only, other novel services can be provided 
to home through the Intemet resulting in better user 
experience with out апу : proposed 
lity and portability. The system 
any loss of flexibility h lianoes such as 
ome &ppitan 
by them composed of networked 
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reftigerators, microwave ovens, air conditioners, and 
washing machines, Bluetooth access point and home 
terminal in a home. Bluetooth units were embedded within 
all the appliances and these wee connected to the Internet. 
Home terminal was connected to the home network and the 
Internet and it provided a well designed GUI to the user 
through touch panel and voice recognition and it also 
worked as a gateway between the home network and the 
Internet service provider. The overview of the proposed 
network is shown in fig.5. The authors developed some 
actual functions for each home appliance, such as cooking 
mode/timer settings for a microwave oven or monitoring 
stocks through a sensor in the refrigerator. ECHONET ver3, 
a specification for control/monitoring a function in home 
appliances was included in the system. It, defines 
control/monitoring interface of functions for white 
appliances, sensors and healthcare appliances. 

In 2005 Hayong Oh and others [13], highlighted the 
importance of energy efficient routing scheme for the 
sensors placed in the home. According to the authors, in the 
emerging automated home, sensors are required to be placed 
everywhere in the house to collect various physical data such 
as temperature, humidity, and light to provide information to 
various appliances. For example, the heating system senses 
the temperature of the home and controls it according to the 
weather. The authors argued that in the conventional sensor 
routing scheme each sensor node detects an event and 
broadcasts the event to all sensor nodes within one hop range 
from where all the nodes broadcast the message to the next 
nodes. This process is recursively performed until the event 
reaches the base station. This scheme leads to excessive 
drain of battery power and as these sensor nodes have 
limited battery power, an energy efficient sensor routing 
scheme is critical for the successful implementation of home 
network. Therefore they proposed a new sensor routing 
scheme for home automation networks and called it as 
RDSR (Relative Direction based Sensor Routing). The 
proposed scheme divides the home area into sectors and 
locates a manager node to each sector. The manager node 
receives collected data from sensor devices in its sector and 
then transfers the data to the base station through the shortest 
path of the 2-dimensional (x, y) coordinates. The proposed 
scheme was shown to be energy efficient. 

In 2006 Mario Kolberg and Evan H. Magill [14] addressed 
the control of complex networked appliances. Currently a 
standard computer interface is most often used to configure 
and remotely control these appliances. However the authors 
argued that this is unsuitable for the target audience which is 
often inexperienced with the use of computers. Therefore 
they proposed Anoto-enabled pen and paper as a suitable 
alternative as users are highly familiar with pen and paper 
and they wil! find it suitable for control. In the proposed 
system Bluetooth and mobile telecommunication network is 
used to transfer data to a service provider where it is 
processed and sent to the user's home. It was shown that the 
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approach can be used to control a number of different 
appliances in the home and also outside the home. 


5. SECURITY ISSUES 

In 2003 H. Nakakita and others [12] addressed the problem 
of security of wireless home network. They observed that 
considering the advantages offered by Bluetooth as 
communication medium in wirelees home networks, it is 
expected to be applied to all the home networks in near 
future. However they felt that its wide spread acceptance will 
result into multiple home networks placed close to one 
another and they will operate simultaneously in an 
overlapped area. In these circumstances, a wireless home 
network may encounter serious security problems like 
eavesdropping from outside the home or masquerading as a 
member of the network. The authors proposed several 
requirements for a secure home network such as (i) 
separation of communication between inside and outside of 
home network, (if) prevention of eavesdropping from outside 
the network, (11) an easy method of adding new wireless 
appliances to the network, (iv) a way of deregistering 
unused, stolen, or discarded appliances from the home 
network. 

The authors proposed a system fulfilling the above security 
requirements. The proposed network was a server based 
system to manage the wireless home appliances through the 
use of existing frameworks with encryption function in the 
data link layer. The system assigned a unique master key to 
each appliance and some shared network keys. The shared 
network key was periodically updated in order to ensure the 
security of the home network. 


6. CONCLUSIONS AND SCOPE FOR FURTHER 
WORK 

The journey of home automation technology has been 
critically investigated in this paper. It is observed that from 
simple interconnection and combination of household 
appliances it has evolved into a powerful technology for the 
networking of home appliances for the purpose of not only 
remote control but also for adding intelligence to the home 
and providing novel services resulting in great improvement 
in user experience. Furthermore, the revolutionary 
developments in the fields of high speed computing devices 
along with TCP/IP based internet, wireless and mobile 
communication have really helped in the rapid growth of 
home automation. The use of these technologies in home 
automation and networking is definitely going to increase in 
the years to come. In Indian context, this field is still in an 
infancy stage and deserves to be pursued rigorously. 
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ABSTRACT 

In this era of digital computing, the interest and necessity of 
representing information in visual forms has become very 
Important, Due to considerable improvement in computing 
and network technologies, and the availability of better 
bandwidths, the past few years have seen a considerable rise 
in the accessibility, sophistication, and transmission of 
digital images using imaging technologies like digital 
cameras, scanners, photo-editing, and software-packages. 
However, this technology is also being used for 
manipulating digital images and creating forgeries that are 
difficult to distinguish from authentic photographs. 
Tampering of images involves pasting one part of an image 
onto another one, skillfully manipulated to avoid any 
suspicion. Any image manipulation can become a forgery, 
based upon the context in which it is used. The sophisticated 
and low-cost tools of the digital age enable the creation and 
manipulation of digital images without leaving any 
perceptible traces. As a result, the authenticity of images 
can't be taken for granted, especially when it comes to legal 
photographic evidence. Manipulations on an image 
encompass processing operations such.as scaling, rotation, 
brightness adjustment, blurring, contrast enhancement, etc. 
or any cascade combinations of them. Thus the problem of 
establishing image authenticity has become more complex 
with easy availability of digital images and free 
downloadable image editing softwares leading to 
diminishing trust in digital photographs. Detecting forgery in 
the digital images is one of the challenges of this exciting 
digital age. A lot of research is underway to detect and 
prevent forgery in digital images. One of the problems in 
web based image applications is non-availability of original 
image for evaluation, Further, digital imagery 
authentication techniques based on cryptographic principles 
and digital signatures offer no modification protection 
following image transmission. In this paper, we study the 
major approaches to detect forgery in digital images. 
Initially, the process of digital image tampering is explained. 
Subsequently, we analyze some of recent algorithms for 
detecting digital forgery including copy-move, chromatic 
aberration, PCA for detecting duplicated image, lighting 
inconsistencies. Preliminary investigations show that 
different algorithms have different domains of tampering 
detection and have different merits and demerits. The 
decision about the content authenticity is complex and can 
be better established by interpreting the results obtained by 
appfying a set of these methods. 
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1. INTRODUCTION 

An image is a two-dimensional function, f(x, y), where x and 
y are spatial (plane) coordinates and the value of f(x,y) at 
any peir of coordinates (x,y) is called the intensity or gray 
level of the image at that point. An image contains a lot of 
information and can be monochromatic or colored. When the 
digital technology is used to capture, store, modify, or view 
images, they must be first converted into numbers: 1s and 0s 
called bits. A combination of cight bits is called a byte. A 
digital image is composed of a finite number of elements 
which are referred to as pixels. A pixel is a basic unit of a 
colored or monochromatic image on a computer display or 
in a computer generated image. A common color image file 
of size 1024 X 1024 pixels and 256 colors (or 8 bits per 
pixel) occupies 3MB of disk or RAM space. Since a colored 
image contains more information (coloring details), so its 
file size is comparatively much larger than that of 
monochrome. Digital images are typically stored in either 
24-bit or 8-bit files. Color variations for the pixels are 
derived from three primary colors: red, green, and blue. 
Each primary color is represented by 1 byte; 24-bit images 
use 3 bytes per pixel to represent a color value. These 3 
bytes can be represented as hexadecimal, decimal, and 
binary values [3]. 

In contrast to analog signal processing in which the image 
signal is treated as a continuous signal, digital image 
processing has many advantages. It allows a much wider 
range of algorithms to be applied to the input data and can 
avoid problems such as the build-up of noise and signal 
distortion during processing. Digital image formation, the 
foremost step in any digital image processing application, 
consists basically of an optical system, a sensor and a 
digitizer. The optical signal is usually transformed to an 
electrical signal by using a sensing device (c.g. a Charge 
Coupled Device sensor). The analog signal is transformed to 
a digital one by using a video digitizer (frame grabber). 
Thus, the optical image is transformed to a digital one. Due 
to inherent limitations of the processing systems, each digital 
image formation subsystem may introduce a deformation or 
degradation to the digital image (e.g. geometrical distortion, 
noise, non-linear transformation etc.) The mathematical 
modeling of the digital image formation system is very 
important in order to have precise knowledge of the 
degradations introduced. After conversion of the image to 
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binary data stream, it is put back together in a grid of small 
squares. These tiny squares also called sample space are the 
pixels, and are the building blocks of all the computer 
graphics and images. The values in the pixels indicate the 
intensity level associated with that pixel. 
There has been wide availability of the different powerful 
image processing and editing software with help of which 
the digital images can be easily manipulated. Many of these 
software are freely available and often do not require any 
special skills to operate. A digital image can be enlarged, 
enhanced, backgrounds, color contrasts and color schemes 
can be altered, even facial features can be changed to some 
other person's appearance. Images can be converted from 
one image format to another and any part of image can be 
altered pixel by pixel. Before the digital age, it was fairly 
easy to detect the altered photographs. But now with the 
advent in the commercial softwares, the tampering of the 
photographs have become very easy, can be carried out 
without any obvious signs of tampering and it is becoming 
harder to uncover and spot the authentic ones. With the 
increased reliance on digital images for information, the 
need to ensure their authenticity increases as well. Research 
in the field of image authenticity is still in its infancy state. 
"Recently, research on digital image forensics has gained 
attention by addressing forgery detection and image source 
identification. Both static images as well as video can be 
manipulated. However, in the current paper, we have 
discussed the digital forgeries related to static digital images 
only. 
Any image manipulation can become a forgery, based upon 
the context in which it is used. An image altered for fun or 
someone who has taken a bad photo, but has been altered to 
improve its appearance cannot be considered a forgery even 
though it has been altered from its original capture. On the 
other side, some people creates a forgery for gain and 
prestige and to make the recipient believe that the image is 
real and not the fake one. Three types of forgeries can be 
identified: 

8) Using Graphical Software is one method in which a 
forged image can be created. It especially needs a skilful 
creator who can ensure that the image he is creating is 
realistic, e.g. that the fall of light on objects in an image 
is consistent right across the image, that shading is 
consistent, the absorption of light by an object ес: An 
image created using this method takes some time to 
develop. 

b) Creating an image by altering its Content is another 
method, In this, the recipient is duped to believe that the 
objects in an image are something else from what they 
really are. The image itself is not altered, and if 
examined will be proven as so. 

Creating an image by altering its Context is the third 

method. In this, objects are removed or added from an image 

resulting in copy-move forgeries. E.g. a person can be added 
or removed. The easiest way is to cut an object from one 
image and insert it into another image. Various image / 
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photo editing softwares like Adobe Photoshop, XnView, 
ProShow Gold etc. make this a simple task [6]. 
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Figure 1. Example of a Digital Image Forgery 


An example of a digital forgery is shown in Figure 1. As the 
newspaper cutout shows, three different photographs were 
used in creating the composite image: Image of the White 


House, Bill Clinton, and Saddam Hussein. The White House 


was rescaled and blurred to create an illusion of an out-of- 
focus background. Then, Bill Clinton and Saddam were cut 
off from two different images and pasted on the White 
House image. Care was taken to bring in the speaker stands 
with microphones while preserving the correct shadows and 
lighting. Figure 1 is, in fact, an example of a very realistic 
looking forgery [7]. 

With this increased reliance on digits! images for 
information, the need to ensure their authenticity increases 
as well. The manipulation of images through forgery 
influences the perception an observer has of the depicted 
scene, potentially resulting in ill consequences if created 
with malicious intentions. This poses a need to verify the 
authenticity of images originating from unknown sources in 
absence of any prior digital watermarking or authentication 
technique. Authentication of digital images plays an 
important role in forensic investigation, criminal 
investigation, insurance processing, sugveillance systems, 
intelligence services and journalism. 

There have been quite a few techniques proposed in 
combating the tampering of digital images. The digital 
camera computes a cryptographic hash of the image, and 
encrypts the hash using the private component of the key, 
which is built into the camera. The encrypted hash is then 
stored along with the digital image. Another complementary 
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approach is to use digital time-stamping / digital signatures. 
These schemes effectively protect the data from 
modification during transmission, but they offer no 
protection following transmission. Since the information 
needed for these schemes to perform the authentication is 
separate from the data. An attacker can simply modify the 
data, recalculate the new message digest or digital signature, 
and attach them together. Without knowledge of the original 
data or of the original authentication information, it is 
impossible to contest the authenticity of the modified digital 
image. Since the value of digital images is based on its 
content, the image bits can be modified to embed codes 
without changing the meaning of its content. Once the codes 
are embedded in the data content and the data is 
manipulated, these codes will also be modified so the 
authenticator can examine them to verify the integrity of the 
data, [8] 

The widely used approach to verify an image's authenticity 
is to embed checksums into the least significant bits (LSB) 
of the image. A secret numeric key known by both the 
sender and the recipient protects these checksums. Another 
cost effective way to authenticate picture is through the use 
of metadata, although the information gathered from 
Metadata cannot stand on its own, as metadata is not strictly 
bound to a file, but it can provide useful information if it is 
used in the proper context. 

The process of detecting image tampering is supposed to be 
carried out in six stages. The first five stages correspond to 
major theoretical goals of the process, the last one is related 
to real-life applications, a) blind method for resampling 
detection, b) blind method for duplicated regions detection, 
c) detection of discrepancies in lighting conditions and 
brightness levels, d) automatic method for detection of 
double JPEG compression, e) detection of inconsistent noise 
patterns, f) system integration and testing. Overall, these 
methods proved encouraging in detecting image forgeries 
with an observed accuracy of 60%. 

Also, Digital watermarks have been proposed as a means for 
fragile authentication, content authentication, detection of 
tampering, localization of changes and recovery of original 
content. While digital watermarks can provide useful image 
before the tampering occurs. This limits their application to 
controlled environments that include military systems or 
surveillance cameras. Unless all digital acquisition devices 
‚аге equipped with a watermarking chip, it will be unlikely 
that a forgery-in the-wild will be detectable using а 
_ Watermark. It might be possible, but very difficult, to use 
unintentional camera “fingerprints” related to sensor noise, 
its colour gamut, and / or its dynamic range to discover 
tampered areas in images. Another possibility for blind 
forgery detection is to classify textures that occur in natural 
images using statistical measures and find discrepancies in 
those statistics between different portions of the image. At 
this point, however, it appears that such approaches will 
produce a large number of missed detection as well as false 
positives. 
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In this research work we have studied the techniques and 
methods of Digital Image Forgery Prevention and Detection 
Mechanisms. Also, we have reviewed the forgery detection 
method using Block Matching techniques of Copy-move 
algorithm [7, 11]. In the next section, we discuss some of the 
algorithms which have been presented by different 
researchers for detection of digital image tampering. Under 
Results & Discussion, we investigate and comparatively 
analyze some of the algorithms on the basis of the merits, 
demerits, input, output and space & time complexity. We 
present the conclusion and the future directions in which we 
are working. 


2. LITERATURE REVIEW 

The sophisticated and low-cost tools of the digital age enable 
the creation and manipulation of digital images without 
leaving any perceptible traces. As a result, the authenticity of 
images can't be taken for granted, especially when it comes 
to legal photographic evidence. Manipulations on an image 
encompass processing operations such as scaling, rotation, 
brightness adjustment, blurring, contrast enhancement, etc. 
or any cascade combinations of them. Doctoring images also 
involves the pasting one part of an image onto another one, 
skilifully manipulated so to avoid any suspicion. One 
effective tool for providing image authenticity and source 
information is digital watermarking. 

These digital watermarks also offer forgery detection. 
Several watermarking techniques have been proposed. One 
uses a checksum on the image data which is embedded in the 
least significant bits of certain pixels. Others add a maximal 
length linear shift register sequence to the pixel data and 
identify the watermark by computing the spatial cross- 
correlation function of the sequence and the watermarked 
image. Watermarks can be image dependent, using 
independent visual channels, or generated by modulating 
JPEG coefficients. These watermarks are designed to be 
invisible, or to blend in with natural camera or scanner 
noise. Visible watermarks also exist. In addition to these, a 
visually undetectable, robust watermarking scheme has 
come into existence which can detect the change of a single 
pixel and can locate where the changes occur. The 
algorithms work for color images and can accommodate 
JPEG compression [9]. 

The embedding of a watermark during the creation of the 
digital object limits it to applications where the digital object 
generation mechanisms have built-in watermarking 
capabilities. Therefore, in the absence of widespread 
adoption of digital watermarking technology, it is necessary 
to resort to image forensic techniques. Image forensics can 
reconstitute the set of processing operations to which the 
image has been subjected. In ‘turn, these techniques not only 
enable us to make statements about the origin and 
authenticity of digital images, but also may give clues as to 
the nature of the manipulations that have been performed. 
One such image forensic scheme is based on the interplay 
between feature fusion and decision fusion in which three 
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categories of features are considered, namely, the binary 
similarity measures between the bit planes, the image quality 
metrics applied to denoised image residuals, and the 
statistical features obtained from the wavelet decomposition 
of an image. These forensic features were tested against the 
background of single manipulations and multiple 
manipulations, as would actually occur in doctoring images 
[10]. 

The availability of powerful digital image processing 
softwares, such as PhotoShop, XnView, ProShow Gold, 
makes it relatively easy to create digital forgeries from one 
or multiple images. Over the past few years the field of 
digital forensics has emerged to detect various forms of 
tampering. A common manipulation in tampering with an 
image is to copy and paste portions of the image to conceal a 
person or object in the scene. Another possibility for blind 
forgery detection is to classify textures that occur in natural 
images using statistical measures and find discrepancies in 
those statistics between different portions of the image. At 
this point, however, it appears that such approaches will 
produce a large number of missed detections as well as false 
positives [7]. 

Another efficient technique which can automatically detect 
and localize duplicated regions in an image, works by first 
applying a Principal Component Analysis (PCA) on small 
fixed-size image blocks to yleld a reduced dimension 
representation. This representation is robust to minor 
variations in the image due to additive noise or lossy 
compression. Duplicated regions are then detected by 
lexicographically sorting all of the image blocks [11]. This 
technique is effective on plausible forgeries, and has 
quantified its sensitivity to JPEG lossy compression and 
additive noise. The detection is possible even in the presence 
of significant amounts of corrupting noise. 

Building specifically on this work, and more broadly on all 
of these forensic tools, a new lighting-based digital forensic 
technique came into existence. While creating a digital 
composite of two or more people, it is often difficult to 
match the lighting conditions under which each person was 
originally photographed and the lighting effects due to 
directional lighting (e.g., the sun on a clear day). At least one 
reason for this is that such a manipulation may require the 
creation or removal of shadows and lighting gradients. To 
the extent that the direction of the light source can be 
estimated for different objects / people in an image, lighting 
inconsistencies can therefore be a useful tool for revealing 
traces of digital tampering [12]. 

Also, a newly developed forensic tool came into existence 
that exploits imperfections in a camera's optical system. 
When creating a digital forgery, it is sometimes necessary to 
conceal a part of an image with another part of the image or 
to move an object from one part of an image to another part 
of an image. These types of manipulations will lead to 
inconsistencies in the lateral chromatic aberrations, which 
can therefore be used as evidence of tampering [13]. This 
current -approach only considers lateral chromatic 
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aberrations. The efficacy of this approach is seen in 
detecting tampering in synthetic and real images. 

As usual, all of these techniques will be vulnerable (weak / 
defenseless) to countermeasures that can hide traces of 
tampering. This technique, in conjunction with a growing 
body of other forensic tools, is effective in exposing digital 
forgeries. 


3. BRIEF MATHEMATICAL REVIEW 

The pre-requisite of forgery detection using copy-move 
algorithm includes — completion of the match process in 
finite and reasonable time and allowing an approximate 
match of small image segments. Since any digital image can 
be considered as an array M x N of pixels with certain 
associated intensities, any tampering of type copy-move can 
introduce a correlation between the original image and the 
pasted one. This correlation can be used to detect the 
tampering. Primarily there are two approaches used to find 
the approximate block matching: 

1. Exhaustive Search 

2. Autocorrelation 

Exhaustive Search: In this method, the image and its 
circularly shifted version are overlaid looking for closely 
matching image segments. Let us assume that x, is the pixel 
value of a grayscale image of size M x N at the position ij. 
In the exhaustive search, the following differences are 
examined: 

„.., N-1 for all i and 


It is easy to see that comparing x, with its cyclical shift [k, T] 
is the same as comparing x, with its cyclical shift [k’, 17], 
where k’=M-k and I’=N-[. Thus, it suffices to inspect only 
those shifts [k, 1] with I< k<M/2, 1ISN/2, thus cutting the 
computational complexity by a factor of 4. ` 
Finding the correct threshold value ‘t’ is challenging because 
even in natural images there may be a large amount of pixel 
pairs that may produce the differences below the threshold. 
However, this threshold difference Ax, can be considered to 
set the proper threshold value based on the requirements, 
complexity and results. 
The comparison and image processing require the order of 
MN operations for one shift. Thus, the total computational 
requirements are proportional to (MN). 
Autocorrelation: This technique is based on the fact that the 
original and copied segments will introduce peaks in 
autocorrelation corresponding to the segments which have 
been copied and moved. However, the computation of 
autocorrelation factor after passing the given through High- 
Pass filter provides better results. 
The autocorrelation of the image x of the size M x N is 
defined by the formula: 
м N 
Perm E E XK, Xue pi, k 0, ... 
j=] j=] 


„M-1,j,1=0, ...., Nel. 
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The autocorrelation can be efficiently implemented using the 
Fourier transform utilizing the fact that r =x » x’, where x,” = 
Хм» Nap = 0... M-1, j = 0... N-1. Thus we have 

r = F'(F(x) F(x’)}, where F denotes the Fourler transform. 
The working of autocorrelation copy-move forgery detection 
method is explained in the flowchart below: 


Accept Tested Image 





Apply High-Pass Filter 






Figure. 2. Flowchart depicting the working of 
autocorrelation copy-move forgery 
detection method. 

Note: B — Minimal size of a copied-moved 
segment. 

г — Autocorrelation. 


that matches exactly two types of methods can be done using 


specified this is then considered for the match. To identify 
the identical rows of the given matrix ‘A’ are 
lexicographically ordered. The matching rows are then 
searched by going through all m x n rows of ordered matrix 
‘A’ and looking for two consecutive rows that are identical. 
The blocks form an irregular pattern that closely matches the 
copied-and-moved foliage. This method also indicates the 
use of retouch tool on the pasted segment to cover the traces 
of the forgery.In the Robust match technique, the quantized 
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DCT coefficients are calculated and ‘Q’ factor is computed 
that determines the quantization steps for DCT transform 
coefficients. Since, quantized values of DCT coefficients for 
each block are compared; the algorithm might find too many 
matching block pairs. This matching can be reduced by 
computing shift vector *s' between two matching blocks as 
given below: 

8 = (в, $2) = (1-1, -ја). 

Because the shift vectors —s and s correspond to the same 
shift, the shift vectors s are normalized, if necessary, by 
multiplying by -1 so that s; > 0. 

The Exhaustive search is quite simple and effective and is a 
most obvious approach whereas the Exact match approach 
works significantly much better and faster than. other 
approaches. Also, Exhaustive search technique used in 
detecting copy-move forgery is quite computationally 
expensive. Moreover, the computational complexity of the 
exhaustive search makes it impractical for practical use even 
for medium-sized images. 


4. RESULTS AND DISCUSSION 

The practice of forging photographs is probably as old as the 
art of photography itself. Digital photography and powerful 
image editing softwares like Adobe Photoshop, Xnview, 
ProShow Gold, made it very easy today to create believable 
forgeries of digital pictures even for a non-specialist. As 
digital photography continues to replace its analog 
counterpart, the need for reliable detection of digitally 
doctored images is quickly increasing. Recently, several 
different methods for detecting digital forgeries were 
proposed. Jessica Fridrich, David Soukal and Jan Lukáš 
proposed a method based on detection of Copy-Move 
Forgery in digital images. Also, Alin C Popescu and Hany 
Farid established a method for exposing digital forgeries by 
detecting Duplicated Image Regions. Micah K. Johnson and 
Hany Farid proposed several methods for exposing digital 
forgeries such as Detecting Inconsistencies in Lighting and 
detecting inconsistencies through Chromatic Aberration. For 
each of these methods, there are circumstances when they 
will fail to detect a forgery. The copy-move detection 
method is an efficient and reliable detection method which 
focuses on a special type of digital forgery — the copy-move 
attacks in which a part of the image is copied and pasted 
somewhere else in the image with the intent to cover an 
important image feature. The method may successfully 
detect the forged part even when the copied area is enhanced 
/ retouched to merge it with the background and when the 
forged image is saved in a lossy format, such as JPEG. This 
method supports two algorithms for detecting Copy-Move 
forgery, one that uses an exact match for detection and other 
that is based on an approximate match. The two approaches 
introduced by the approximate match algorithm are 
Exhaustive Search and Autocorrelation whereas two other 
approaches introduced are Exact match algorithm and 
Robust match algorithm. The Exhaustive search is quite 
simple and effective and is a most obvious approach whereas 
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the Exact match approach works significantly much better 
and faster than other approaches. This method of detection is 
limited to one particular case of forgeries, when a certain 
part of the image was copied and pasted somewhere else in 
the same image (e.g., to cover an object). It is very difficult 
to use unintentional cameras “fingerprints” related to sensor 
noise, its color gamut, and / or its dynamic range to discover 
tampered areas in images. Also, Exhaustive search technique 
used іп detecting copy-move forgery is quite 
computationally expensive. Moreover, the computational 
complexity of the exhaustive search makes it impractical for 
practical use even for medium-sized images. The next 
method for detecting duplicated regions in an image works 
by first applying a Principal Component Analysis (PCA) on 
small fixed-size image blocks to yield a reduced dimension 
representation that is robust to minor variations in the image 
due to additive noise or lossy compression. Duplicated 
regions are then detected by lexicographically sorting all of 
the image blocks. This technique is efficient on plausible / 
credible digital forgeries and quantifies its robustness and 
sensitivity to additive noise and lossy JPEG compression. It 
is such an efficient technique that automatically detects 
duplicated regions in a digital image. The detection of 
duplicated image regions are still possible even in the 
presence of significant amounts of corrupting noise. This 
technique works in the complete absence of digital 
watermarks ог signatures offering a complementary 
approach for image authentication. This representation is 
robust to minor variations in the image due to additive noise 
or lossy compression. But still, little doubt is there that 
counter-measures will be created to foil this technique. The 
method for exposing digital forgeries by Detecting 
Inconsistencies in Lighting, for instance, can be a useful / 
wonderful tool for revealing traces of digital tampering 
while creating a digital composite of two or more people 
standing side by side. It is often difficult to exactly match 
the lighting conditions / effects from the individual 
photographs due to directional lighting (e.g. the sun on a 
clear day, floor lamp, single directional light source with 
controlled lab settings). 

This method is efficient in estimating the direction of a point 
light source from only a single image using various forensic 
tools adopted from computer vision (field / world). The 
standard approaches used here for estimating the light source 
direction / illuminant's direction includes: Infinite Light 
Source (3-D), Infinite light Source (2-D), Local Light 
Source (2-D) and Multiple Light Sources. Also, it can be 
extended to accommodate a local directional light source 
e.g. a desk lamp, floor lamp. Moreover, it is applicable and 
effective on both synthetically generated images and natural 
photographs. 

The various loop holes / flaws of this method includes that 
this solution requires the knowledge of 3-D and 2-D surface 
normals from at least four and three distinct points 
respectively on a surface with the same reflectance. With 
only a single image and no objects of known geometry in the 
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scene, it is unlikely that this will be possible. Manipulations 
ih images in this technique may require the creation or 
removal of shadows and lighting gradients. Also, this 
method assumes nearly Lambertian surface for both the 
forged and original areas and might not work when the 
object does not have a compatible surface, when pictures of 
both the original and forged objects were taken under 
approximately similar lighting conditions. This system also 
may not work during a cloudy day when no directional light 
source was present. The Chromatic aberration method is 
used for automatically estimating lateral chromatic 
aberration and shows its efficacy in detecting digital 
tampering. Lateral Chromatic aberration manifests itself, to a 
first order approximation, as an expansion / contraction of 
color channels with respect to one another. When tampering 
with an image, this aberration is often disturbed and fails to 
be consistent across the image. This approach is effective 
when the manipulated region is relatively small, allowing for 
a reliable global estimate. It is efficient for detecting digital 
tampering in synthetic and real images and can be used to 
detect tampering in visually plausible forgeries. 

This model fails to estimate Longitudinal Chromatic 
aberrations and other forms of optical distortions. It also 
fails when the manipulated region is relatively very large. 
For synthetic images, the average error is 3.4 degrees with 
93% of the errors less than 10 degrees. For calibrated / real 
images, the average error is 20.3 degrees with 96.6% of the 
errors less than 60 degrees. Thus, the average errors for real 
images are approximately six times larger than the 
synthetically generated images. Much of these errors are due 
to longitudinal chromatic aberrations. Obviously, the 
problem of detection of digital forgeries is a complex one 
with no universally applicable solution. Thus, a set of 
different tools can be all applied to the image at hand. The 
decision about the content authenticity is then reached by 
interpreting the results obtained from different approaches. 
This accumulative evidence may provide a convincing 
enough argument that each individual method cannot. So in 
future, all these techniques in conjunction with a growing 
body of other forensic tools, is effective in exposing digital 
forgeries. The Comparative analysis of the selected above 
mentioned algorithms on the basis of the various merits- 
demerits, domain, types of input-output etc. has been 
presented in the form of table, Table 1. 


5. CONCLUSION 

Techniques and methodologies for validating the 
authenticity of digital images and testing for the presence of 
tampering and manipulation operations on them have 
recently attracted attention. Detecting forgery in the digital 
images is one of the challenges of this exciting digital age. 
The sophisticated and low-cost tools of the digital age enable 
the creation and manipulation of digital images without 
leaving any perceptible traces. As a result, the authenticity of 
images can't be taken for granted, especially when it comes 
to legal photographic evidence. Thus, the problem of 
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establishing image authenticity has become more complex 
with easy availability of digital images and free 
downloadable image editing softwares leading to 
diminishing trust in digital photographs. Another common 
manipulation in tampering with portions of the image is 
"copy-move". Spotting digital fakes by detecting 
inconsistencies in lighting is another method. Primarily, in 
this paper we have reviewed two approaches the Exhaustive 
Search and the Autocorrelation which are used to find the 
approximate block matching. Robust search method reduces 
the number of searches where as exact match search is 
exhaustive and requires more memory and time. Therefore, 
robust technique is better in case of time dependent 
interactive searches. 


6. FUTURE SCOPE 

We have been further working on the field of Digital Image 

Tampering in the following areas: 

1. Analyzing other recent algorithms related to forgery 
detection methods like Digital Watermarking, 
Inconsistencies {n the Complex Lighting Environments, 
Color Filter Array Interpolation, Re-sampling etc. 

2. Video Forgery Detection Methods. — . 
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ABSTRACT 

Extensible Markup Language [XML] database is one of the 
hot areas of research now days. Number of developers and 
research organizations are working on the capabilities and 
efficiency of using Extensible Markup Language [XML] 
database because of its unique structure and processing 
speed. In any web based database search, there is need of a 
server side script as well as back-end RDBMS [Relational 
Database Management System]. Generally SQL [Structured 
Query Language] is used for querying the database. 
Moreover, the web hosting cast is also very high in various 
plans using these technologies. XML has an exceptional 
Jeature to be used as a database as well as a document. XML 
document is generally formatted using CSS [Cascading Style 
Sheets] or XSL [Extensible Stylesheet Language. 

In this research work, we have compared the capability and 
efficiency of XML as a database rather than a simple web 
document. This work is dedicated to the competence of PHP 
[Hypertext Preprocessor] and XML over PHP and MYSQL. 
In our results, we have proved that XML is giving good 
results rather than MYSQL. 


KEYWORDS 
Optimization, XML, Web Development, Database. 


1. INTRODUCTION 

In any web based database search, there is need of a server 
side script as well as back-end RDBMS [Relational 
Database Management System]. Generally SQL [Structured 
Query Language] is used for querying the database. 
Moreover, the web hosting cost is also very high in various 
plans using these technologies, XML has an exceptional 
feature to be used as a database as well as a document [1]. 
XML document is generally formatted using CSS 
[Cascading Style Sheets or XSL [Extensible Stylesheet 
Language] [2]. 


2. EXISTING THEORY 

The problem in the existing architecture is the platform 
dependence of various scripts and languages. The hosting of 
Windows Platform support Microsoft products and 
obviously costlier than the other hosting plans [3]. Servers 
which host the website require operating systems and 
licenses. Windows 2003 and other related applications like 
SQL Server each cost a significant amount of money; on the 
other hand, Linux is a free operating system to downloed, 
install and operate. 
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The basic ideas underlying XML are very simple: tags on 
data elements identify the meaning of the data, rather than, 
e.g, specifying how the data should be formatted [ss in 
HTML], and relationships between data elements are 
provided via simple nesting and references [4]. 

To host the website on Windows Server platform, there is 
restriction that we can use ASP [Active Server Pages] for 
database maintenance, At the back-end, there is restriction to 
use MS-ACCESS or SQL SERVER as RDBMS. 

On other side, Linux hosting uses PHP or CGI [Common 
Gateway Interface]//PERL [Project Extraction and Report 
Language] with MYSQL as back-end database. ColdFusion 
and JSP do not fit in these categories of hosting. Java Server 
Pages [JSP] is a Java technology that allows the software 
developers to create dynamically web applications, with 
HTML, XML, or other document types. JSP needs the web 
server compatible with Java Technology and the commonly 
used Web Servers are JRun and Apache Tomcat. On the 
other hand, ColdFusion is a software language that is also 
used for Internet application development such as for 
dynamically-generated web sites. ColdFusion is a similar 
Active Server Pages, JavaServer Pages or PHP but it is 
platform independent. 

More specifically, Microsoft based technologies require 
additional tools including Antivirus utilities for security 
which is more costly to host in international web hosting 
servers [5]. 

Table 1 depicts the platform dependence of various scripts 
and respective back-end databases. 


yee 
[HTTP Web Server 
equired 

| MS Access, SQL Server | 

JSP [JRun, WebSphere, MS Ac SQL „Мз ee 
Oracle 

2o 

Apache http server Server, Oracle 


vaScript[No Need Of 
Web Server 

It may be used as Database 
itself 


XML [No Need Of Web 
Server 

Table 1. Different Server Side Scripts with their back-end 

databases 
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Note —The above specified table has been prepared after 
hands-on experience with all these technologies. 


3. RESEARCH METHODOLOGY 

We have taken two databases XML and MYSQL for storing 
records. The values from database are fetched using PHP. 
Entire exercise was performed to calculate the execution 
time of getting results from database. The script was 
executed and tested on all prominent web browsers so that 
the actual performance can be obtained. 

To prove the efficiency related to XML database, we have 
used LAMP (LINUX APACHE MYSQL PHP] stack on one 
side and LAXP [Linux, Apache, XML, PHP] on other side. 
These two different frameworks are used to get the same 
type of results in optimized manner. 

First of all, I have used XML as back-end database and 
retrieved the records using PHP. Time of Execution of the 
query is recorded so that it can be compared with other stack 
LAXP. 

At second attempt, I have used LAMP [Linux Apache 
MYSQL PHP], in which the same database structure and 
records are used. These records are retrieved using PHP 
from MYSQL database. 

In this attempt, the time of execution of the script and query 
execution time is recorded. Finally, the execution time of 
the queries is compared. After comparing the conclusion is 
found that XML is obviously the better option as a back-end 
database rather than other RDBMS packages. 

Here is the source code in PHP to fetch the records — 


PHP SCRIPT FOR CONNECTION AND QUERIES 
EXECUTION 
<?php "EET 
list(Susec, $sec)  explode( ',microtime()); 

$querytime before = ((float)Susec + (float)$sec); 
$q-$ GEI['q"]; 

$xmiDoc = new DOMDocument(); 
$xmIDoc->load("EmployeeDatabase.xml"); 
$x-$xmlDoc-»getElementsByTagName('NAME?); 
for ($im0; $i«-$x-»length-1; $i++) 


( 
//Process only element nodes 
if ($x->item($1)->nodeType—1) 


( 

if ($x->item($i)->childNodes->item(0)->nodeValue == $q) 
{ 
$y-X5x-»item($i)-»parentNode); 
} 


} 
} 


$Employee=($y->childNodes); 
for ($i0;$i<$Employee->length;$i++) 


{ 
//Process only element nodes 
if ($Employee->item($f)->nodeType==1) 
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{ 

echo($Employee->item($i)->nodeName); 

echo(": "); 

echo($Employee->item($i}->childNodes->item(0} 
>nodeValue); 

echo("<br />"); 

} 


} 

list($usec, $sec) = explode( ',microtime()); 
$querytime after = ((float)Susec + (float)$sec); 
Squerytime = $querytime after - $querytime before; 
$strQueryTime = 'Query took 9601.4f sec’; 

echo sprintf($strQuery Time, $querytime); 

2 


3.1 OPTMIZATION ISSUES 

e Using XML as back-end database entirely rather than a 
costly RDBMS package hosting 

e Execution time of the queries 

e Optimization of Back-end Database 

e Optimization of processing load on the Server 


3.22 DEVELOPMENT TOOLS TO BE USED: 

XML with its various standards with PHP as server side 
script 

e XML-DOM [Document Object Model] 

• XQuery 


e XForm 


3.3 TESTING TOOLS 

To test the web scripts/code written in XML will be tested in 

ai various prominent HTTP Clients [Web Browsers] 
Internet Explorer 7 

г Firefox 1.0.2 

3. Mozilla 1.7.8 

4. Opera 8 

5. Netscape 6 


4. SIMULATION AND EXPERIMENT RESULTS 
FOUND 
41 XML DATABASE STRUCTURE USED 
An XML document has a logical and a physical structure, 
More specifically, XML the document is composed of units 
called entities. An entity may refer the other entities which 
cause their inclusion in the document. An XML document 
begins with a root or document entity. The document 
comprises various declarations, elements, comments, 
character references, and processing Instructions, all of 
which are indicated in the document by explicit markup. A 
software component called an XML processor is used to 
read XML documents and provides access to their content 
and structure. An XML processor is doing its work on behalf 
of another module which is called the application. This 
specification tells the required behavior of an XML 
processor in terms of how it should read the XML data and 
the information it must provide to the application [6]. 
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Here is the XML Database which is used at back-end named 
as EmployeeDatabase.xml - 
<?xml version™"1.0" encoding="ISO-8859-1"?> 
<EMPLOYEES> 
<EMP> 

<ID>1</ID> 

<NAME>BOB DYLAN</NAME> 

<SALARY>9000</SALARY> 


<DEPARTMENT>RESEARCH</DEPARTMENT> 
</EMP> 
<EMP> 
<ID>2</ID> 
<NAME>BONNIE TYLER</NAME> 
<SALARY>10000</SALAR Y> 
<DEPARTMENT>TESTING</DEPARTMENT> 
</EMP> 
<EMP> 
<ID>3</ID> 
<NAME>DR. SMITH</NAME> 
<SALARY>19000</SALARY> 
<DEPARTMENT>TESTING</DEPARTMENT> 
</EMP> 
<EMP> 
<ID>4</ID> 
<NAME>GARY MOORE</NAME> 
<SALAR Y>23000</SALAR Y> 


<DEPARTMENT>DEVELOPMENT</DEPARTMENT> 
</EMP> 
<EMP> 
<ID>5</ID> 
<NAME>EROS RAMAZZOTTI</NAME> 
<SALARY>30000</SALARY> 


<DEPARTMENT>DEVELOPMENT</DEPARTMENT> 
</EMP> 
<EMP> 
<ID>6</ID> 
<NAME>BEE GEES</NAME> 
<SALARY>13000</SALARY> 


<DEPARTMENT>RESEARCH</DEPARTMENT> 
</EMP> 
<EMP> 
<ID>7</ID> 
<NAME>DR.HOOK</NAME> 
<SALARY>UK</SALARY> 
<DEPARTMENT>TESTING</DEPARTMENT> 
</EMP> 
<EMP> 
<ID>8</ID> 
<NAME>ROD STEWART</NAME> 
<SALARY>12000</SALARY> 


<DEPARTMENT>DEVELOPMENT</DEPARTMENT> 
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</EMP> 
<EMP> 
<TD> 9</ID> 
<NAME>ANDREA BOCELLI</NAME> 
<SALARY>17000</SALARY> 
<DEPARTMENT>DEVELOPMENT</DEPARTMENT> 
</EMP> 
<EMP> 
<ID>10</ID> 
<NAME>PERCY SLEDGE</NAME> 
<SALARY>80000</SALARY> 
<DEPARTMENT>RESEARCH</DEPARTMENT> 
</EMP> 
</EMPLOY EES> 


42 EXPERIMENT RESULTS ON EXECUTION OF 


Е 


ATTEMPT 1 | Query took Query took 0.0885 sec 
0.0435 sec 

ATTEMPT 2 | Query took Query took 0.0825 sec 
0.0430 sec 

ATTEMPT 3 | Query took Query took 0.0715 sec 
0.0235 sec 

ATTEMPT 4 | Query took Query took 0.0825 sec 
0.0245 sec 
Query took Query took 0.0845 sec 

Query took 0.0755 sec 
























Е 


ATTEMPT 12 | Query took Query took 0.0825 sec 
0.0630 sec 

ATTEMPT 13 | Query took Query took 0.0755 sec 
0.0435 sec 

ATTEMPT 14 | Query took Query took 0.0865 sec 
0.0645 sec 

ATTEMPT 15 | Query took Query took 0.0875 sec 
0.0435 sec 

ATTEMPT 16 | Query took Query took 0.0785 sec 
0.0325 sec 

ATTEMPT 17 | Query took Query took 0.0755 sec 
0.0535 sec 

ATTEMPT 18 | Query took Query took 0.0835 sec 
0.0625 sec 


ATTEMPT 20 | Query took Query took 0.0815 sec 
0.0235 sec 


Select the Department: DEVELOPMENT 


ID: 9 

NAME: Andrea Bocelli 
SALARY: 17000 
DEPARTMENT: Development 
Query took 0.0435 sec 


5. CONCLUSION 

Generally, to implement the web based database 
maintenance or standalone, there must be a Web Server 
{known as HTTP Server], which cater the requests from 
Web Browser [known as HTTP Client]. The addition 
resources [Back-End Database] for web based database 
searching will be avoided using the plan; rather we can use 
XML as database and retrieval, search. I have achieved the 
goal of optimization of resources using XML code with 
PHP. 


The existing structure of the web based database 


maintenance includes the ADO which builds connection 
between the various web applications. A range of server 
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ATTEMPT 19 | Query took Query took 0.0625 sec | 
0.0735 sec 
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side scripts are available for structuring the ADO 

connection. The well-known server side scripts are ASP 

[Active Server Pages], PHP [Personal Home Pages], CGI 

[Common Gateway Interface], PERL [Project Extraction and 

Report Language], JSP [Java Server Pages], Macromedia 

ColdFusion. The web administrator is required to use ADO 

connection for creation, updations and maintenance of the 

beck-end database. XML overcomes all these complications 
by its use as a database structure. ‚ 
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ABSTRACT 

Researchers often need to find expertise in their chosen area 
^f research. Finding expertise is very useful as relevant 
research papers can be studied and the experts could be 
identified. Therefore finding expertise in the chosen area of 
research has always attracted interest among academic 
community, These days research institutions and individual 
researchers make their publications and research findings 
available on web. With the exclusive growth of World Wide 
Web search engine users are overwhelmed by the huge 
volume of results returned in response to a simple query, 
which is far too large to get the desired knowledge. 
Therefore one of the methods of finding the expertise is by 
way of efficiently and accurately’ clustering the web 
documents, which enhances the integrity of web search 
' engine. Data mining techniques matured making it possible 
to automate the web document clustering. In this paper, we 
present mutually exclusive Maximal Frequent Item set 
discovery based K- Means clustering approach. It has been 
implemented in JAVA. The common text processing 
approach is to convert the downloaded web documents into 
vectors. It is being done by extracting document features and 
it generates the document-feature data set. For a set of 
documents, the feature set is composed of all terms 
appearing in any one of the documents. We call this a 
document-feature data set. If document m contains feature n, 
then the corresponding value, in row n and column m of the 
table, is set to one. Otherwise, it is zero. Then, Apriori 
algorithm is applied to these document feature data set. The 
mutually exclusive frequent sets generated by Apriori 
algorithm are taken as initial points of K-Means algorithm. 
The output of the K-Means clustering algorithm will be the 
sets of highly related documents appearing together with 
same features. This approach enables the clustering of the 
web documents. It enables researchers to find the documents 
related to their desired area clustered and displayed 
together during the web search. It will significantly help 
them in terms of saving the time and getting all the relevant 
papers together in a cluster.. 


KEYWORDS 

Web Document Clustering, Vector: space model, Term 
frequency, Invert Document frequency, Apriori algorithm, 
maximal frequent set, k-means clustering. 


1. INTRODUCTION 
The growth of the Internet has seen an explosion in the 
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amount of information available; Document clustering plays 
an important role for helping people organize this vast 
amount of data. It attempts to organize documents into 
groups such that documents within a group are more similar 
to each other than documents belonging to different groups 
Researchers often need to find expertise in their chosen area 
of research. This is very useful as relevant research papers 
can be studied and the experts could be identified. Therefore 
finding expertise in the chosen area of research has always 
attracted interest among academic community. These days 
research institutions and individual researchers make their 
publications and research findings available on web. The 
first stage in any document clustering technique is document 
representation model. 

The rest of this paper is organized as follows: in section 2, 
Vector Space Model that is used in literature for document 
clustering will be briefly introduced. Section 3 presents k- 
means clustering algorithm and method used to calculate 
initial centroids in detail. Section 4 describes Web 
Document Clustering algorithm for finding expertise in 
Research Area in detail. The experimental results aro given 
in section 5. Finally, conclusion and some füture research 
directions are presented in Sections 6 and 7 respectively. 


2 DATA MODEL 
Most clustering algorithms expect the data set to be available 
in the form of a set of vectors 

X= (xi, o .4, Xa) 
Where the vector x, i = 1... m corresponds to a single object 
in the data set and is called the feature vector. Extracting the 
proper features to represent through the feature vector is 
highly dependent on the problem domain. 


2.1 Document Data Model 
Vector Space model is selected to represent document 
objects. Each document is represented by a vector d, in the 
term space such that 

d= (Wi, Wi... Ма } (1) 
where i= 1,..., n is weight calculated as explained in 
following paragraph. 


Term weighting scheme is employed here to measure the 
significance of each term [2]. In this scheme, tf, represents 
term frequency (TF) and idf, represents inverse document 
frequency (IDF). The assumptions behind TF*IDF are based 
on two empirical observations: First, the more times a term 
occurs in a document, the more relevant it is to the topic. 
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Second, the more times a term appears throughout all 
documents in the whole collection, the more poorly it 
discriminates between documents. Therefore, term 

is the number of times one term 4 appears in a 
document / and г f (k i) is used (о denote it. Inverse 
document frequency is inversely proportional to df, which is 
the document frequency for term t. Given M documents and 
N terms, the computation of idf (X) is as follows [2]: 


2 

df (k) = los (5) Ө 
Therefore, the weight is given as 

wa (ДЕ, i) * idf (A) (3) 
After the above transformation, the complicated, hard-to- 
understand documents are converted into machine 
acceptable, mathematical representations. The problem of 
measuring the similarity between documents is now 
converted to the problem of calculating the distance between 
document vectors. The standard cosine similarity, which 
defines the angle or cosine of the angle between two vectors, 
is utilized in our spplication. It is computed as follows: 


; 


саа T 


For a group of vectors A, in K-means, they need to be 
represented by their “central” vector. This central vector(C,) 
is generated by taking the average value of all the points 
included in this group. It is calculated as follows: 

C, = Lad. (5) 


М 


(4) 


3, CLUSTERING ALGORITHMS 

The process of grouping a set of physical or abstract objects 
into classes of similar objects is called clustering. A cluster 
is a collection of data objects that are similar to one another 
within the same cluster and are dissimilar to the objects in 
other clusters. A cluster of data objects can be treated 
collectively as one group and so may be considered as a 
form of data compression. 


3.1. K-Means Clustering Algorithm 

K-means is one of the simplest unsupervised learning 
algorithm that solves the well known clustering problem [6]. 
The procedure follows a simple and easy way to classify a 
given data set through a certain number of clusters (assume 
k clusters) fixed a priori. The main idea is to define k 
centroids, one for each cluster. These centroids should be 
placed in an efficient way because different location causes 
different result, So, the better choice is to place them as 
much as possible far away from each other. The next step is 
to take each point belonging to a given data set and associate 
it to the nearest centroid. When no point is pending, the first 
step is completed and an early groupage is done. At this 
point we need to re-calculate k new centroids as barycenters 
of the clusters resulting from the previous step. After we 
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have these k new centroids, a new binding has to be done 
between the same data set points and the nearest new 
centroid. A loop has been generated. As a result of this loop 
we may notice that the k centroids change their location step 
by step until no more changes are done. In other words 
centroids do not move any more. 

Finally, this algorithm aims at minimizing an objective 

function, in this case it is cosine distence specified in the 

previous section: 

ге algorithm is composed of the following steps: 

Place K points into the space represented by the objects 
that are being clustered. These points represent initial 
group centroids. 

2. Assign each object to the group that has the closest 
centroid. 

3. When all objects have been assigned, recalculate the 
positions of the K centroids. 

4. Repeat Steps 2 and 3 until the centroids no longer move. 
This produces a separation of the objects into groups 
from which the metric to be minimized can be 
calculated. 


3.2. Calculating initial cluster centroids 

The Apriori algorithm[5] is the most well known association 
rule mining algorithm. It uses the following property, which 
we call the large itemset property. Any subset of a large 
itemset must be large. The large itemsets are also said to be 
downward closed because if an itemset satisfies the 
minimum support requirements, so do all of its subsets. 

The basic idea of the Apriori algorithm is to generate 
candidate itemsets of a particular size and then scan the 
database to count these to see if they are large. During scan 
i, candidates of size i, C, are counted. Only those candidates 
that are largo are used to generate candidates for next pass. 
That is, L, is used to generate C,.;.An itemset is considered 
as large, if all of its subsets are also large. 

We can use this algorithm to generate the initial points of k- 
means algorithm for document clustering. 


4. ALGORITHM DESCRIPTION 

Assume that each document in the document-feature data set 
corresponds to an item in the transactional database; each 
feature corresponds to a transaction. The aim is to search for 
highly related documents appearing together with same 
features. Similarly, the frequent item set discovery in the 
transaction database serves the purpose of finding items 
appearing together in many transactions. Therefore, if we 
apply frequent item set discovery to our document feature 
data set, “frequent” document set will be discovered. 

Here frequent document sets are documents appearing 
together with the same feature, i.e, document sets which 
have large amount of feature in common. These documents 
are considered to be related to a certain extent. Minimum 
support is the minimum similarity among documents in our 
application. 
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The advantage of using frequent item set discovery is that it 
can capture the relation among more than two documents 
while the normal similarity measurement, such as cosine 
similarity mentioned above, can only calculate the proximity 
between two documents. Moreover, frequent item set 
discovery is capable of detecting the most related document 
sets in the whole collection. These document sets can be 
viewed as having the highest density if we imagine all these 
document vectors are in a n-dimensional space. The density 
inside a correctly defined cluster is normally higher than its 
outside area. Therefore, these document sets are regarded as 
the initial clusters and their centroids are the initial points for 
K-means algorithm. 

A maximal frequent item set mining algorithm is employed 
in this experiment. Suppose that the required cluster number 
is k. Then we get the maximal frequent item sets with the 
largest support. The centers of those frequent item sets are 
the initial points of K-means algorithm. 

The clustering process can be summarized as follows: 


ALGORITHM 
Input: Text files containing abstracts of various research 
papers, Stop word list and Minimum support. 
Stepi: Read terms in text files containing abstracts of 
research papers. 
Step2: Remove terms in Stop word list and remove 
stemming using Porter Stemming Algorithm [8]. 
Step3: Prepare document feature matrix 
Step4: The matrix generated in Step 3 and the minimum 
support will be given as input to Apriori Algorithm and get 
the Minimum Frequent Item sets (МЕТ)еѕ output. 
МЕТІ (Iib...) 
Where I7 (d,,d,,...d;). 
Step5: For each document, generate the document vector. 
d-(tf(t,d) * idf 0,0) * idfo, tRtad) * idfa) 
Step6: Calculate the initial centers as follows: 
Calculate the center of each item set in MFI 
Then IP is: 

P,7Center п 

P?-Center р 

Pk=Center Ik 


2a 
И 
Set the initial points of k-means algorithm as IP 


Where 





Center, = 


,Step7: Set the initial points of K-means algorithm as IP. 


Get clustering result. 

Output: The sets of highly related documents appearing 
together with same features. 

The algorithm is depicted in Figure 1. 


5, EXPERIMENTAL RESULTS 
The process is implemented in JAVA. 
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A simple example is given here to illustrate the whole 
process of the approach. The data tested consists of twelve 
abstracts whose names were given as in table 1. The feature 
set includes six terms: document, cluster, vector, space, 
model, term. Table 2 shows the details of this document- 
feature data set. Given the minimum support 50%, two 
maximal frequent document sets discovery procedure is 
depicted in Figure 2. Document vectors calculated by using 
equation 1 and equation 3 are shown in Table 2. They 
consist of six terms. The discovered maximal frequent 
document sets are considered to be the highest related 
documents and they construct the initial clusters. Therefore, 
their cluster centroids are computed according to equation 5. 
We set these generated vectors as the initial points in K- 
means algorithm. Then the algorithm starts to assign each 
document vector to its nearest cluster centroid and re- 
compute the new cluster center. This iteration continues until 
all the clusters do not change any more. Figure 3 illustrates 
the process and shows the final results. These twelve 
documents are divided into two groups. 


6. CONCLUSION 

In this paper, an approach for clustering web documents has 
been proposed. The experimental results of testing on web 
documents show that the proposed web document clustering 
method is clustering the relevant documents is more reliably 
and simply as compared to other document clustering 
methods. 

The proposed web document clustering method clusters the 
documents and presents to the researcher only those 
documents, which they intend. 


7. FUTURE SCOPE 

Study can be undertaken to assess the possibility of 
combining this method with clustering algorithms using 
wavelet analysis. As an extension, similar clustering 
techniques can be used to find the current trend of a 
particular research area, and to find the leading journals in a 
research area and the details about the researchers who are 
working in the same area. 
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Figure 1. Discovering Initial Points Using 
Apriori Algorithm 
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Figure 2. Process of K-Means Clustering 
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ABSTRACT 

Recent advancements in network technology allow 
Integration of different services on the same networking 
infrastructure. Thus, voice, data and video or multimedia 
traffic share the same transmission, switching and storage 
resources over a single network. This integration offers the 
user a single access facility to all communication services 
through a unified interface. Since Asynchronous 
Transmission Mode (ATM) networks support diverse 
services such as voice, data, video etc., therefore, ATM has 
been chosen for the use in the Broadband Integrated 
Service Digital Networks (B-ISDN). 

In this paper, we have developed a queuing model for the 
ATM networks in which three types of traffic e.g. voice, 
data, and video are considered. We analyze a discrete time 
single-server (GI/I/1) queuing system with three priority 
queues of infinite capacity. The waiting time distribution for 
the packets in each class is derived explicitly. We have also 
derived expressions for probability generating function of 
the system contents along with the packet delay of these 
classes considered in the study. 


KEYWORDS 
B-ISDN, Priority Scheduling, ATM Networks, Probability 
Generating Function, packet delay. 


1. INTRODUCTION 

With the increased demand for communication service of all 
kinds (voice, data and video etc.), Broadband Integrated 
Service Digital Networks (B-ISDN) has received increased 
attention in the past few years. The key.to the success of B- 
ISDN system is the ability to support a wide variety of traffic 
and diverse service and performance requirements. The B- 
ISDN is an appropriate choice to support traffic requiring 
bandwidth ranging from a few kilobits per second (e.g. a 
slow terminal) to several hundred megabits per second (e.g. 
moving image data). Some traffic, such as interactive data 
and video, is high bursts; while some traffic, such as large 
files, is continuous. The B-ISDN is also required to meet 
diverse service and performance requirements of multimedia 
traffic. Some services such as real-time video 
communication require error- free transmission as well as 
rapid transfer [1]. The B-ISDN has received increased 
attention as communication architecture capable of 
supporting multimedia applications. The B-ISDN networks 
are being designed to carry the traffic generated by wide 
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range of services. These services will have diverse traffic 
flow characteristics and performance requirements, Among 
the techniques proposed to implement B-ISDN, 
Asynchronous Transfer Mode (ATM) is considered to be the 
most promising technique because of its efficiency and 
flexibility [2], [3]. An ATM is a fixed length transport 
scheme, which can carry heterogeneous mix of traffic in an 
integrated and efficient way by statically multiplexing bursty 
traffic flows. An ATM can be considered as the switching 
technology that supports two fundamental approaches of 
switching: circuit switching and packet switching [4],[5]. A 
B-ISDN should be able to facilitate expected (as well as 
unexpected) future service in a practical and easily expanded 
fashion. A few examples of expected future services include 
high-definition TV (HDTV), broadband videotext, and 
video/document retrieval services [6], [7]. The ATM is now 
becoming promising technology for transport of high- 
bandwidth applications. Different types of traffic need 
different QoS standards. For real-time applications mean 
delay and delay jitter are not too large, while for non-real- 
time applications, the cell loss ratio (CLR) is the restrictive 
quantity. Two priority categories can be distinguished, which 
will be referred to as delay priority and loss priority. Delay 
priority scheduling tries to reduce the delay of delay- 
sensitive traffic (such as voice). This is done by using a more 
sophisticated type of scheduling than the simple FIFO 
scheduling. Priority is given to delay-sensitive traffic over 
delay-insensitive traffic. Several types of delay priority (or 
cell scheduling) schemes such as weighted-round-robin 
(WRR), weighted-fair-queuing (WFQ) have been proposed 
and analyzed for ATM applications, each with their own 
specific algorithmic and computational complexity [8]. On 
the other hand, loss-priority schemes attempt to reduce the 
cell loss of loss-sensitive traffic (such as data). Again, 
various types of loss-priority (or cell discarding) strategies 
for ATM such as push out buffer (POB), partial buffer 
sharing (PBS) have been presented in the literature [9]. An 
overview of both types of priority schemes has been given 
by Bae and Suda [10]. In ATM networks, one of the most 
important problems is to meet the QoS for all traffic, e.g. the 
delay and loss requirement for real-time and non-real-time 
traffic. One method of solving this problem is the use of 
priority control [11], [12], [13]. J.Walraevens et.al.[14] 
proposed a discrete time queueing system with HOL (Head 
Of Line) priority and also developed generating functions for 
assessing the performance of ATM buffers. There have been 
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a number of contributors with respect to switches with 
output queueing, In the case of a single traffic type and a 
FIFO scheduling discipline [15], [16], [17]. 

In this paper, we have proposed a new queueing model for 
integrated high speed data networks in which three types of 
traffic are considered. We have employed priority queueing 
discipline to analyze mean delay of the system, high priority 
is given to highly sensitive data (which cannot be stored for 
the longer period of time) and low priority is given to normal 
data. We have analyzed a discrete time single-server 
(GI/l/1) queueing system with three priority queues of 
infinite capacity. The waiting time distribution for the 
packets in class is derived explicitly and expressions for the 
probability generating function of the system contents are 
also derived along with the packet delay of these classes 
considered in the study. : 


2. MATHEMATICAL MODEL 

We investigate a discrete-time queueing system with one 
server and three priority classes with infinite capacity. The 
time is assumed to be slotted and the transmission time of a 
packet is one slot. We have considered three types of traffic 
arriving in the system, namely packets of class] (video), 
packets of class2 (voice) and packets of class3 (data) which 
arrive in the first, second and third queue respectively. In 
multiply (integrated service) systems various types of data 
can be transmitted through single channel. In the current 
communication systems we can access broadband (Internet), 
telephone and cable TV networks through single channel. In 
such type of integrated communication system priority 
discipline plays an important role because sensitive data 
(high priority data like video data) needs to be transmitted 
without delay whereas the insensitive data (like low priority 
data) can be stored for later transmission. Therefore in this 
model, we assign the highest priority to video data, then 
comes in priority order the voice data and finally to the 
simple data. The number of arrivals of class j during slot k is 


denoted by а G-l 2, 3) and the @, , "ваге independent 


and identically distributed (11.9) from slot-to-slot. However, 
in one slot, the number of arrivals of ono class can be 
correlated with the number of arrivals of the other classes. 
The total number of arriving packets during slot k is denoted 


by: 

Bp, = 8,,* 85, +83} 

and its Probability generating function (pgf) is defined as 
Ar (= Ez^" ]-A(z, 2). 

Further, we define the marginal pgf's of the number of 
arrivals from all classes 


A (@)= E[z ^ ]- AG, z)  where,j71,2,3 
From these pgf's we can calculate the arrival rate of class j: 
A, &E[s ,,]7 A. 
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The total arrival rate is the sum of the arrival rates of all 
classes: 


Ат” Ay (1)= AQ) 4,0) AQ) 

The system has one server that provides the transmission of 
packets, at a rate of one packet per slot. Newly arriving 
packets can enter in the service at the beginning of the slot 
following their arrival slot at the earliest. Packets in queuel 
have a higher priority than those in queue2 and queue3. 
Packets in queque2 have higher priority than those in 
queue3. 


3. SYSTEM CONTENTS 

In this section, we derive the steady-state joint pgf of the 
system contents of all three queues. We assume that the 
packet in service (if any) is part of the queue that is serviced 
in the slot. We denote the system contents of queue j at the 
beginning of slot k by u , апа the total system contents at 


the beginning of slot k by и; у. As there are three distinct 


classes of messages, we find it necessary to distinguish 
among the imbedded points as to which class completes 
service. This is indicated by the term j class epoch, where j 
=1, 2, or 3. Letu,, be the number of class j messages in the 


system at the k^ departure epoch. We can express u,, as 
the sum of the number of the system contents at the previous 
epoch and the number of new arrivals. If the (k+1)* 
departure epoch is in class, then the impact of this is that a 
classi message is in the process of departing from the system 
and that new messages of all three classes are arriving while 
the message of classi is being transmitted. We have for 


u? баз: 


Ug TU ац (1а) 
Чон Чал, tan (1b) 
Us 4,4 TU 4, 8j (1c) 


where, a, , (k = 1, 2, 3) is the number of class К massages 


arriving during the transmission of a class] massage. For 
simplicity of discussion we have dispensed with any 
reference to the departure time in the transmission of class! 


m " 
Similarly, if (k--1)" departure epoch is in class 2, we have 
for u,, > О as: 


Шы 821 (2a) 
Urea Uyl tan (2b) 
Ug 4; “U3, + 825 (2c) 


Because of the priority discipline, there could not have been 
any class] message in the system at the k? departure. In 
considering a class3 epoch we recognize that the kf 
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departure must have left the system devoid of class! and 2 
messages. We have for U; , >0 


Чин 783 (38) 
U2 417 832 (3b) 
U5 4,4 703, 185; (3c) 


Joint pgf of the system contents of all queues at the 
beginning of slot (k+1) yields 
Up 5,25)" Elz", 257" 
Bz ен рв My 4 tOn A »0HB[z;" "MESE u = 0 
uz >0] + Ez" z7? uy, =u), = 04, > 0 
=z)" ЕД2 zy” JE[z," z2? z3 Elz," z7? ] ра 1+ 
Еау" 277] 
zi A(2,z2 [U ; Z123 }U „ 0.2, )] 
*z; A( 2,2, (т, —1) U, (0,0) -U, (0, 2;)1 
+А(22,) (4) 
For steady-state distribution of the system contents, 
U(z,,2,) we define as : 
0(21,2,)= lim U, (222) 
Applying this limit in equation (4), we get the following : 
U(zn)" 
Аб) (з) -000,5)1+ 
z; (zz) (e, -1)U(0,0) + U(0, z,)]+ (д2) 


U zA(nz) 
Se (z, - A(z,2,)) 


А 1, A(z, Xz, — 1) (0,0) (5) 
z,(z, ~ A(z,z,)) 


А (zx, A(1;2,) - 1, A(z x; ))U (0, т,) 
т,(® - A(z,2,)) 

The right hand side of the equation (5), contains two 
quantities which need to be determined namely the function 
U (0, 2,) and constent U (0, 0). 

To compute the function U(O, z,),we apply Rouche's 
theorem, provided that for a given value of 2, in the unit 
circle (|z,| $1), the equation 2) = A(z,Z,) has one 
solution in the unit circle for z,, which will be denoted by 
y(z,)in the remainder and is implicitly defined by 
y(z) =A (Y (z), z). 
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Since 7(Z,) is an approximation to the zero (i.e. root) of 
the denominator of the right hand side of equation (5) and a 
generating function remains finite in the unit circle, 
therefore, }/(2,) must also be а zero of the numerator. 
Hence, we have 

ss) ААЛА ESO ы -0 


By solving the above eqaution we get 
оф)" mre) Ges Aes уво (6) 


ИКОН кр zs in equation (5) 
U(z,)- hA. Aun – DU(.O) 
[5 Az) niz —A(z,,)] (7) 
A1,1)- 1, A(z) 
]د‎ - 4(z,,)] 
Е 25 M Aena DU (0,0) 


Neidio E ао ао) 
by substituting zı by 1, by appliying the normalization 
candition U(1,1) =1 and by using l’Hospitals rule. The result 
is the probability of having an empty system 

:U(0, 0) -1- 4 

Notice that the stability condition equals A, < 1. 

Uz2,) 54a). , 4 00-4) (21)-54(54) 
n-A(nn) — z(n-4(n5) z,(z,- (22) (8) 
je E: : р | 

From this pgf, we can calculate the marginal pgf values 

U, (z)(j = 1, 2, 3) of the system contents of class j : 


U, (z) = lim EE] -U(s,) 


By putting z; =z апі z 57 1 in equation (8), we get the 
following: 

(z) (z) 9 
u я OAD paa) 9) 
Fo, О, (2) = lim Ег“ ]=U (1,2) 


By putting zı =l and 2:= z in equation (8), we get the 

following: 

U,(2)™ со AG) + AGE“) 4(2) 4) (10) 
pm [на 200. а-2)| 

U;(2)= lim Ez] 002,2) 


By putting z; =z апі 2: = z in equation (8), we get the 
following: 
r-AG) (z- че; 
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4. PACKET DELAY 

The packet delay is defined as the total amount of time that 

a packet spends in the system, i.e. the number of slots 

between the end of the packets arrival slot and the end of its 

departure slot. In this section, we shall derive expressions for 

the pgf values of the packet's delay of three classes. 

The amount of time a tagged class] packet spends in the 

system i.e. packet delay for class 1 is given by: 

d -[u,- +1 (12) 
Here,[...]" denotes the maximum of the argument 

and zero. slot k is assumed to be tho arrival slot of the tagged 


packet, и | , is the system contents of queuel at the beginning 
of this slot, and f, , is defined as the total number of class 1 


packets that arrive during slot k, and which have to be served 
before the tagged packet. Similarly, for class2 and class3 
packets: 


d, -[u, ~1] * fi, + f, +1 (13) 
d, S[u, -lT * fu t A, t A +1 (14) 


For class! the pgf F, (2) = E [z^] can be calculated for 
queuel. 
iD]. 400-1 (15) 
ica elu 
E(1*) = F(z)[U,(z)+ (к - DU, (0)] (16) 
Using equation (9) and (15) in (16), we get: 
dis A(z)-1 
AG-D 
zA (1) (17) 
z¬ A(z) 
EE ±4 (1) - 4A (z) 
= A(x) 
x[r(0* 4,0] 
+(x ~1) x [4,(0) 
Similarly, for class2 and class3 are given by. 
_ 42-1 
F(z) AG-D 
E(z*) = F;(2)F,(2)[U,(z)+ (s - DU, ()] 
^P x -1 4)-1 
AG-D A(-D 


AG) AEX“) _ 
1-4) -A n 
zj“ 1 
51-40) 
[rote 1,021] 
А 4(0) 
HAO 
Z A,(z)-1 
F(x) AG-D 
E(15) = K(z)F G)F GIU, (x) + (1 —1)0,(0)] 


(18) 
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d ~AG@)-1, 4@)-1, 4(2-1 
'AG-D AED AG-D (19) 


т (zXz-D,4 _, e z 
|e ee д)+(@-1ух( a) 


5. CALCULATION OF MEAN OF PACKET DELAY 
In this section, we give expressions for the mean values of 
the studied stochastic variables. To make the expressions 


more readable, we define A, and Жу. as follows: 
~ 842,2) _?4 (2) 
An в HE ad is E. 


Equations for mean of packet delays are as follows : 
Ed)" CAPAS ABIT 


_ 4/3) -11-4/3- 4/3 (Ay * 4) 
4, 


(1-4,/3Y -1IX1- A/3P xA, 
A: 
ا‎ 
1-4/3 0 - 4) 
(acea anra eno 
E(d,)= тыны ia | 
АА 
x(1- (1-4, /Зу) 
„ж@-А!ЗУа-@-А /з3у) 


АА 
E(d,)=1-4, 
RTD + (4 + 4 - 21-13 
1-[(A, +4 ¬ 2)1 - А /3)!] 
" (1~ 4X1 A, 13) 
1-[(4 +4 - 2)1 А /3)?] 


6 NUMERICAL EXAMPLE 

We assume three types of traffic. Traffic of class-1 is delay 
sensitive (for video) and in this order traffic of class-3 is 
assumed to be delay insensitive (for instance data). The 
packet arrivals on each epoch are assumed to be i.i.d. with 


arrival rate A... An arriving packet is assumed to be class-j 
with probability 4,/ A, G=1,2,3)(A, = A+ 4+ А). 
We define a as the fraction of class-1 arrivals in the overall 
traffic mix (i.e. а = A,/ А;). In Fig.l, mean Packet delays 
and total arrival rates of classes are shown for a = 0.25, 
Values of A,,, А, and Aj, can be calculated using 
Ag7 PAG ), 


i 
д°тт, 


х= 





(20) 


(22) 
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< 
where А (д2) = 0-4-20-2)- 50-2)" 
for N = 3 (Total Inlets). 


7. CONCLUSION 

In this paper we analyzed an integrated network system with 
priority scheduling discipline, We have obtained generating 
functions and performance measures such as system contents 
and mean packet delays. In this model high sensitive data is 
defined with high priority class and normal data has been 
given to low priority class. The results and graphs show that 
the mean delay of normal data (low sensitive data) is greater 
than high sensitive data. In the past communication system, 
generally there were two types of data transmissions through 
the single channel, like normal data and voice data (or 
normal data and multimedia data). Whereas in the present 
scenario of communication it is based on multiply system 
where normal data, voice data, multimedia data, broadband 
internet, cable TV, internet TV are transmitted through a 
single communication channel which creates the complexity 
of networks (i.e. high sensitive data may have more delays). 
In this model we have considered three types of data (normal 
data, voice data, and video or multimedia data). As result 
shows high sensitive data (i.e. video or multimedia data) has 
minimum delays comparative to other categories of the data. 
Thus model can be very helpful in the implementation of 
integrated high-speed data networks. 


& FUTURE SCOPE 

By implementing the above mentioned networks we will be 
able to improve the performance of integrated high speed 
networks where time delay is the most important issue for 
the networks. Such type of network is also useful for the 


multiply systems. 
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Figure 1. Mean Value of Packet Delays versus the Total 
Arrival Rate (At@ = 0.25) 
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ABSTRACT | 
Various Image representation methods consume very large 
amounts of memory and takes large transmission time over 
the network. In this paper, we have tried to reduce memory 
requirements for storing images which can get transmitted 
in small time period. Initially, the preamble of a digital 
Image as well as matrix representation of a digital image is 
discussed. The paper proposes an algorithm to optimize the 
image representation including the performance evaluation 
Of this technique compared with the traditional techniques. 


1. INTRODUCTION ; 
Digital images are the basis for the visualization and digital 
representation of designs on computers and paper Î". These 
are defined in memory by a finite valued function over a 
finite domain. Let us assume that a digital image is a 
rectangular array of size M X N having domain D 

D = {(r,c) | r = 0,1 ———, M-1 and c-0,1, ——N-1)!? 

is represented in memory by a matrix of order M X N having 
some integer elements g (1,]) for 0 xi €M-1,0 <j <N-1 
where each element g (i, j) of matrix is considered as a pixel 
element. It reprosent gray level (in monochrome image) or 
союш colored image) associated with pixel position (i, J). 


There is some problem with matrix representation of an 
image. The problem is that it takes very large amount of 
memory to store any image. Since here color or gray level 
of cach pixel is stored as individual element. Even if color 
of some continuous pixels is same, still it has to be stored as 
individual element. This problem gets worst as resolution of 
image get increased. Larger the memory required for an 
image, larger will be the time to transfer that image over 
network 4 

A number of compression methods have been designed to 
reduce transmission time for image transfer. There are two 
major methods to achieve it. Either use dedicated channel 
for image transfer or compress the image before transmitting 
it, In this paper, we have used second method i.e. to 
compress the image before transmitting it. Here rather than 
using matrix representation, image has been represented 
using two 1- D arrays. This method will be more efficient if 
large number of continuous pixels have same color or gray 
level value, 


2. TRADITIONAL MATRIX REPRESENTATION 
OF AN IMAGE 
Digital Image is graphical representation of an object which 
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is, in a regular matrix that is a collection of pixel wise 
grey levels or intensity values Ё, 

According to traditional image representation method, image 
is stored in main memory in matrix form having dimensions 
R x C, where R is no, of-rows and C is no. of columns or we 
can say resolution of screen is R x C, Value of A [i, j] = 
color code of pixel (i, J) Matrix representation of image 
results in large consumption of memory. Like if resolution 
of screen is R x C, range of colors provided is K and word 
length of system to store К color values is w bytes. This 
implies memory required to store an image will be-R x C x 
w bytes. In this image representation method, if oolor of 
some adjacent pixels is same still that same value for each 
pixel has to be repeated for its corresponding position in the 


matrix. 
Resolution | Colors | Memory utilized in 
à used Traditional ` 
method(In bytes) 
Сна 









ус = 
пёхї pe [й — — | 






128x128 [13 — [3r — 
Table 1. Memory requirement in Matrix based image 
representation method (assumed word length w=2 
bytes) | 


3. PROPOSED METHOD 

In proposed method, we can reduce memory requirement by 
taking advantage of adjacent pixels having same color. In 
this method, instead of using traditional 2-D array for storing 
pixel wise color value, we use two 1-D arrays. First 1-D 
array stores colors used in adjacent pixels and second array 
stores count of continuous pixels having same color. 


If . 
Color{i] =n, Countfi] "m  ............... (1) 
implies from current to next ‘m’ pixels have same 
color value ‘n’. 
In proposed method memory size required is calculated as: 
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eyed 


my, ™ (c + d) * w 
Where m, = memory required 
c = size of array color ` 
d = size of array count ты 
w = word length 


Resolution | colors Memory utilized in 
used proposed method 


is 2 

















|[lóx16 — |16 [6 | 
(64x64 [6s — — |256 — 5 | 
[64x64 Ja — 2056 o n — 
[64x64 в — todo 
[64x64 — |64 — 256 —— 
| 128x428 |8 — —j4l2 | 
|128 x128 |64 — — [640 — 5 | 
Table 2. Memory requirement in proposed image 
representation method - 


If resolution of screen is M x N and number of colors used 
in image is p then memory required to store a symmetric 
image according to proposed method will be: 

[0м x Мур + p] хм 
Where w is word length. 
In this method, image is not represented pixel wise, so pixel 


position is calculated using formula: 

a) xextl ifentmodr-0  — ............ (4) 
х=х otherwise 

b) у=0 йуз: ens (5) 
yzytl otherwise 


In proposed method, unlike matrix representation of image 
co ordinates (x, y) of pixel are not known rather color c, and 
number of adjacent pixel cnt, having same color c, is given. 
Therefore, some algorithm is required to find pixel position 
from the given array C and count. 
ALGORITHM: 
Step 1) Se x: = -1, y: 2-1, cat =0 
Step 2) Repeat Step 3 fori * 0 to nc -1 
Step 3) Repeat Step 4 for k = 0 to oount[1]-1 
Step 4 If (cnt mod r0) then . ' 

х=х+1 
[End of step 4 If statement] 
Step 5) if (у==т-1) then 

y=0 

‚ Rise 

yrytl 
[End of step 5 if statement] 
Step 6) Drawpixel(x, y, color [ID . 
Step 7) cnt = cnt+1 du 
[End of step 3 for loop] р 
[End of йер 2 for loop] 
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Step 8) Exit? 


4 COMAPRISON OF PROPOSED METHOD WITH 
. TRADITIONAL MATRIX REPRESENTATION: 
Where in traditional method we have to store color value for 
each pixel according to resolution of screen, in. proposed 
method, we just store the number of pixels that have color 
value color[i], color [1+1] and so on. 

Proposed representation of an 8 X 8 sized image with 8 
colors: 


color{8}={0, 1,2,3,4,5,6,7}; 
count[8]={8,8,8,8,8,8,8,8}; 
Traditional representation of an 8 X 8 sized image with 8 
colors: 
color [8][8= {{0, 0, 0, 0, 0, 0, 0, 0}, 

{1, 1,1, 1,1, 1,1, 1}, 

{7, 7, 7,7, 7, 7,7, 73}; 

Representation of an image acc. to proposed method will 
consume leas memory than traditional method .This is 
because here image is stored color wise and not pixel wise. 
To draw an image of resolution 8 X 8 where 8 colors are 
used in symmetric way, Memory required: to store such 
image in matrix form will be 128 bytes. Whereas, if we store 
same image acc. to proposed method we need only 32 bytes. 
It will not only save memory but also save execution time. 
Time taken to draw above image represented in traditional 
method is noted to be 9844429.978022 ns where as in 
proposed method time is noted to be 9844442970.615385 ns. 
As resolution get increased more and more memory is saved. 
Also less the frequency with which color get changed more 
the memory will be saved in proposed method as compare to 
traditional method as shown below: 


frequency of 2 4 

change in color . 
Memory Saved (in ЕНДЙ 87.45 

% 


Table 3. Memory saved with change in color 





Proposed method not only saves memory, but it also save 
execution time as shown: 










AXO |A — 954342861 | 9544428252 
128X128 [16 — 9544422003 | 9544431672 | 
128 XIE |64 — 9544433545 | 5544433071 | 
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128 Х 128 | 128 | 954443388.3 | 954443357.1 


Table 4, Comparison of execution time taken for Traditional 
and proposed method 


5. CONCLUSION 

In this paper digital images are represented using color codes 
of continuous pixels, which is different from matrix based 
representation that was traditionally used. Н leads to 
reduction in memory requirement for storing an image. Also 
it saves image transmission time over the network. 
Comparison of memory utilized in traditional and proposed 
method has been made using charts. 
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Editorial 


It is a matter of both honor and pleasure for us to put forth the fifth issue of ВІЛТ; the BVICAM's 
International Journal of Information Technology. This issue has been dedicated as a Special Issue on 
“Mobile Ad-Hoc Networks”. It presents a compilation of ten papers that span a broad variety of research 
topics in various emerging areas of Information Technology and Computer Science. Seven papers are 
included under the special issue on "Mobile Ad-hoc Networks” and remaining three papers are on general 
theme. Some application oriented papers, having novelty in application, have also been included in this 
issue, hoping that usage of these would enrich the knowledge base and facilitate the overall economic 
growth. This issue shows our commitment in realizing our vision “to achieve a standard comparable to the 
best in the field and finally become a symbol of quality". 


As a matter of policy of the Journal, all the manuscripts received and considered for the Journal by the 
editorial board are double blind peer reviewed independently by at-least two referees. Our panel of expert 
referees posses a sound academic background and have a rich publication record in various prestigious 
journals representing Universities, Research Laboratories and other institutions of repute, which, we intend 
to further augment from time to time. Finalizing the constitution of the panel of referees, for double blind 
peer review(s) of the considered manuscripts, was a painstaking process, but it helped us to ensure that the 
best of the considered manuscripts are showcased and that too after undergoing multiple cycles of review, 
ag required. 


The ten papers that were finally published were chosen out of more than eighty papers that we received 
from all over the worid for this issue. We understand that the confirmation of final acceptance, to the 
authors / contributors, is delayed, but we also hope that you concur with us in the fact that quality review is 
a time taking process and is further delayed if the reviewers are senior researchers in their respective fields 
and hence, are hard pressed for time. 


We wish to express our sincere gratitude to our panel of experts in steering the considered manuscripts 
through multiple cycles of review and bringing out the best from the contributing authors. We thank our 
esteemed authors for having shown confidence in BIJIT and considering it a platform to showcase and 
share their original research work. We would also wish to thank the authors whose papers were not 
published in this issue of the Journal, probably because of the minor shortcomings. However, we would 
like to encourage them to actively contribute for the forthcoming issues. A very special thanks to the Guest 
Editor; Dr. D. K. Lobiyal and Joint Editor; Mrs. Umang for having taken pain and finalized the papers of 


The undertaken Quality Assurance Process involved a series of well defined activities that, we hope, went a 
long way in ensuring the quality of the publication. Still, there is always a scope for improvement, and so 
we request the contributors and readers to kindly mail us their criticism, suggestions and feedback at 
bijit @ bvicam.ac,in and help us in further enhancing the quality of forthcoming issues. 


SPECIAL SECTION 
“Mobile Ad-Hoc Networks” 


BVICAM's International Journal of Information Technology (BUIT) 
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Hash Security for Ad hoc Routing 
Ashwani Kush’ and C. Hwang” 


Submitted in June 2010; Accepted in November 2010 


ABSTRACT 

A recent trend in Ad Hoc network routing is the reactive on- 
demand philosophy where routes are established only when 
required. Most of the protocols in this category are not 
incorporating proper security features. The ad hoc 
environment is accessible to both legitimate network users and 
malicious attackers. It has been observed that different 
protocols need different strategies for security. An attempt has 
been made to review some of the existing protocols. Finally a 
new scheme based on Hashing has been proposed to secure an 
existing protocol. One-way hash chain is used to protect hop- 
by-hop transmission. The scheme has been incorporated using 
AODV as base protocol and results have been explained using 
NS. 


KEYWORDS 
Security, Ad hoc networks, Routing protocols, Key 
Management, AODV. 


1.0 INTRODUCTION 

An Ad hoc wireless network is a collection of mobile devices 
equipped with interfaces and networking capability. It is 
adaptive in nature and is self organizing. A formed network can 
be de-formed and again formed on the fly and this can be done 
without the help of system administration. Each node may be 
capable of acting as a router. Applications include but are not 


research topic in wired networks, the unique characteristics of 
Ad Hoc networks present a new set of nontrivial challenges to 
security design. These challenges include open network 
architecture, shared wireless medium, stringent resource 
constraints, and highly dynamic topology. Consequently, the 
existing security solutions for wired networks do not directly 
apply to the Ad Hoc environment. The main goal of the 
security solutions for an Ad Hoc network is to provide security 
services, such as authentication, confidentiality, integrity, 


design perspective is the lack of a clear line of defence. Unlike 


wired networks that have dedicated routers, cach mobile node 
in an ed hoc network may fonction as a router and forward 
packets for other peer nodes. The wireless channel is accessible 
to both legitimate network users and malicious attackers. In 


“such an environment, there is no guarantee that a path between 


two nodes would be free of malicious nodes, which would not 
comply with the employed protocol and attempt to harm the 
network operafion. Rest of the paper is designed as: Section 2 
discusses Security Challenges, Survey of various protocols is 
given in Section 3, Section 4 describes new scheme and 
Conclusion has been made in Section 5. 


2.0 SECURITY CHALLENGES 
rs ULNAS Ue ne han) IM ВРК 





In this paper, the prime concern is with the attacks targeting the 
routing protocols for Ad hoc Networks. These attacks [2,3,4,5] 
can be broadly classified into two main categories as: Passive 
attacks, Active attacks 


2.1 PASSIVE ATTACKS 
Passive attacks are the attacks in which an attacker does not 
actively participate in bringing the network down: An attacker 


just cavesdrops on the network traffic as to determine which 


nodes are trying to establish routes, or which nodes are pivotal 
to proper operation of the network and hence can be potential 
candidates for subversion and launching denial of service 
attacks. The attacker can then forward this information to an 
accomplice who in tim can use it to launch attacks to bring 


^ 
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down the network. The nature of attacks varies greatly from 
one set of circumstances to another. 


2.2 ACTIVE ATTACKS 

These attacks involve some modification of the data stream or 
the creation of a false stream. It is quite difficult to prevent 
active attacks absolutely, as this would require physical 
protection of all communications facilities and paths at all 
times. Instead, the goal is to detect them and to recover from 
any disruption or delays caused by them. Figure 1 is a 
description of active and passive attacks. 

There are various types of attacks that can be categorized on ad 
hoc network as: | 
22.1 Location Disclosure: This attack targets the privacy 
requirements of an ad hoc network. 

Black Hole: In a black hole attack a malicious node 
gives false route replies to advertise itself as having 
the shortest path to a destination. 

Replay: An attacker that performs a replay attack into 
the network routing traffic that has been captured 
previously. 

Wormhole: The wormhole attack is one of the most 
powerful ones since it involves the cooperation 
between two malicious nodes that participate in the 
network. 

Blackmail: This attack is relevant against routing 
protocols that use mechanisms for the identification of 
malicious nodes and propagate messages that try to 
blacklist the offender. 

Denial of Service: Denial of service attacks aim at the 
complete disruption of the routing function and 
therefore the entire operation of the ad hoc network. 
Rushing Attack: Rushing attack is that results in 
denial-of-service when used against all previous on- 


2.2.2 


2.2.3 


2.24 


2.2.5 


authentication system. 

Passive Listening and traffic analysis: The intruder 
could passively gather exposed routing information. 
Such a attack can not effect the operation of routing 
protocol, but it is a breach of user trust to routing the 


2.2.9 





Figure 1: (a) Passive Attack (b) Active Attack 
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3.0 SECURE ROUTING PROTOCOLS 

In this section some of the popular secured protocols have been 
analyzed. Efforts have been made to use same metrics for all 
and be bias less. 


3.1 ARAN [6]: Dahill et al. proposed ARAN[ 6], It assumes 
managed-open environment, where there is a possibility for 
pre-deployment of infrastructure. It consists of two distinct 
stages. The first stage is the certification and end-to-end 
authentication stage. Here the source gets a certificate from the 
trusted certification server, and then using this certificate, signs 
the request packet. Each intermediate node in turn signs the 
request with its certificate. The destination then verifies each of 
the certificates, thus the source gets authenticated and so do the 
intermediate nodes. The destination node then sends the reply 
along the route reverse to the one in the request, reply signed 
using the certificate of the destination. The second stage is a 
non-mandatory stage used to discover the shortest path to the 
destination, but this stage is computationally expensive. It is 
prone to reply attacks using exror messages unless the nodes 
have time synchronization. Authenticated Routing for Ad-hoc 
Networks (ARAN) detects and protects against malicious 
actions by third parties and peers in Ad-hoc environment. 
ARAN introduces authentication, message integrity and non- 
repudiation to an Ad-hoc environment [7]. 


Characteristics: 

G) ARAN is able to take care of Replay attacks 

Gi) Itis able to eliminate Rushing attacks 

(ii) It does not effectively deals with location disclosure 


It has no provision for Black Hole and Worm hole 
It does not secure for Denial Of service 
ARAN is loop free 

It ig based on Online trusted certification authority 


3.2 SEAD [9]: This Secure Efficient Ad hoc Distance vector 
routing protocol (SEAD) is robust against multiple 
uncoordinated attackers creating incorrect routing state in any 
other node, in spite of active attackers or compromised nodes 
in the network [9]. To support use of SEAD with nodes of 
limited CPU processing capability and to guard against DoS 
attacks in which an attacker attempts to cause other nodes to 
consume excess network bandwidth or processing time, it uses 
efficient one-way hash functions. It is based on DSDV. It has 
been designed to protect routing update packets. 
Characteristics: 


(i) 
(ii) 
(iii) 
(iv) 
(v) 


SEAD is able to take care of Replay attacks 

It is able to eliminate Rushing attacks 

It does not effectively deals with location disclosure 
It has no provision for Black Hole and Worm hole 
It does secure for Denial Of service 

(vi) SEAD is table driven 

(vii) Itis based on Clock synchronization 

(viii) It is loop free and uses Distance as route metric 


3.3 SRP [9] : Secure Routing Protocol [9] (Lightweight 
Security for DSR[16]), which one can use with DSR to design 


Hash Security for Ad hoc Routing 


SRP as an extension header that is attached to ROUTE 

REQUEST and ROUTE REPLY packets. SRP doesn’t attempt 

to secure ROUTE ERROR packets but instead delegates the 

route-maintenance function to the Secure Route Maintenance 

portion of the Secure Message Transmission protocol. SRP 

uses а sequence number in the REQUEST to ensure freshness, 

but this sequence number can only be checked at the target. 

SRP requires a security association only between 

communicating nodes and uses this security association just to 

authenticate ROUTE REQUESTS and ROUTE, REPLYS 

through the use of message authentication codes. At the target, 

SRP can detect modification of the ROUTE REQUEST, and at 

the source, SRP can detect modification of the ROUTE 

REPLY. It defends against attacks that disrupt the route 

discovery process. It is used with DSR, ZRP. It uses 

mechanism of secure certificate server. 

Characteristics: 

G) SRP is able to take care of Replay attacks 

Gi) Kis not able to eliminate Rushing attacks 

Gii) It does not effectively deals with location disclosure 

(iv) It has no provision for Black Hole, Worm hole and 
invisible node attacks 

(v) It does secure for Denial Of service 

(vi) SRP is loop free and uses Distance as route metric 

(vii) It uses existence of security association between each 
Source and Destination 


3.4 SECURE AODVY [10] : The SAODV [10] implements two 
concepts secure binding between IPv6 addresses and the 
independent of any trusted security service, Signed evidence 
produced by the originator of the message and signature 
verification by the destination, without any form of delegation 
of trust. The AODV[15] protocol is comprised of two basic 
mechanisms, route discovery and maintenance of local 
connectivity. The SAODV protocol adds security features to 
the basic AODV mechanisms, but is otherwise identical. A 
source node that requests communication with another member 
of the MANET referred to as a destination D initiates the 
process by constructing and broadcasting a signed route request 
message RREQ. 

Characteristics: 

( | SAODV is able to take care of Replay attacks 

Gi)  Itis not able to eliminate Rushing attacks 

It does not effectively deals with location disclosure 

It has no provision for Black Hole and Wonn hole 

(v) It does not secure for Denial Of service 

SAODV uses Online key management scheme for 
acquisition and verification of keys 

It is loop free and uses Distance as routing metric 


3.5 SLSP [11]: The Secure Link State Protocol (SLSP) [11] for 
mobile ad hoc networks is responsible for securing the 
discovery and distribution of link state information. The scope 
of SLSP may range from a secure neighborhood discovery to a 
network-wide secure link state protocol. SLSP nodes 
disseminate their link state updates and maintain topological 
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information for the subset of network nodes within R hops, 
which is termed as their zone. Nevertheless, SLSP is a self- 
contained link state discovery protocol, even though it draws 
from, and naturally fits within, the concept of hybrid routing. - 
To counter adversaries, SLSP protects link state update (LSU) 
packets from malicious alteration, as they propagate across the 
network. 


Characteristics: 

G) SLSP is able to take care of Replay attacks 

Gi) Itis not able to eliminate Rushing attacks 

Gi) It does not effectively deals with location disclosure 


(v) It has no provision for Black Hole and Worm hole 

(v) It does secure for Denial Of service 

(vi) SLSP is table driven, Loop free 

(vii) It assumes that Nodes must have their public keys 
certified by a Trust party 


(viii) It uses Distance as Routing metric 


3.6 ARIADNE [12]: A Secure On Demand Routing Protocol . 
for Ad Hoc Networks (ARIADNE) using the TESLA[13] 
broadcast authentication protocol for authenticating routing 
messages, since TESLA is efficient and adds only a single 
message authentication code (MAC) to a message for broadcast 
authentication. Adding a MAC (computed with a shared key) to 
8 message can provide secure authentication in point-to-point 
communication; for broadcast communication, however, 
multiple receivers need to snow the MAC key for verification, 
which would also allow an" receiver to forge packets and 
impersonate the sender. Secure broadcast authentication thus 
requires an asymmetric primitive, such that the sender can 
generate valid authentication information, but the receivers can 
only verify the authentication information. It is used with DSR. 
It is prone to selfish node attack. It prevents attackers from 


tampering uncompromised routes. 

Characteristics: 

(1) ARIADNE is able to take care of Replay attacks and 
immune to wormhole attack. 


Gi) Itis able to eliminate Rushing attacks 

Gii) It does not effectively deals with location disclosure 
(iv) It bas no provision for Black Hole. 

(v) It does secure for Denial Of service 

(vi) It uses TESLA keys distributed to participating nodes 
(vii) It is loop free and uses Distance as Routing metric. 


3.7 SAR [14]: Security-Aware ad boc Routing (SAR) that 
incorporates security attributes as parameters into adhoc route 
discovery. SAR enables the use of security as a negotiable 
metric to improve the relevance of the routes discovered by ad 
hoc routing protocols. We assume that the base protocol is an 
on demand protocol similar to AODV or DSR. In the original 
protocol, when a node wants to communicate with another 
node, it broadcasts a Route Request or RREQ packet to its 
neighbors. It is used with AODV. It uses sequence number and 
time stampings to stop replay attacks. In this route discovered 
may not be the shortest one. 
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Characteristics: 

SAR is loop free 

It uses Security requirement as Routing metric 

SAR uses Key distribution or secret sharing mechanism 
SAR is not logp free, it depends upon selected security 


requirement 

It is able to take care of Replay attacks 

It is not able to eliminate Rushing attacks 

It does not effectively deals with location disclosure 
It does secure for Denial Of service 


4.0 PROPOSED PLAN 

When a source node S needs to discover a route to a destination 
node D, it initiates a route request (RREQ) message, which 
includes the source (S) node and Destination (D) node, à 
request sequence number, and an initial hash value. The initial 
hash value is computed as HO = Hash [n] , where n isa 
random number. The source node S appends the computed 
initial hash value HO , and then broadcast the RREQ packet 
the validity of source node. If any checking process fails, the 
node discards the packet, otherwise, rebroadcasts. Any 
intermediate node, say 1, receiving the packet checks whether 
it has already seen this packet by recognizing the combination - 
of (source node, request sequence number) . If it has, discards 
the packet, as in regular AODV, otherwise it adds its address to 
the node list, replaces the hash value field with Hash(1, 
previous hash value) and rebroadcasts the packet. 


S: Ho= Hash [n] 
5 -> RREQ, S, D, Seqi#,{ } „Ho 


Н, = Hash [1, Ho] - 
1 RREQ, S, D, Seq, (1) Hi 


Н, = Hash [2, Н, ] 


2-> RREQ, S, D, Seg, (1,2) , Н. 
D: [D,S, (24 ) , Seg*] 
D->2: RREP,DS, {1,2} , Seq# 


2-> 1 : RREP, D,S, {1,2} , Seq# 
1->S: RREP,D,S, {1,2} , Seq# 





Figurel: Packets exchanged between nodes during RREQ 
phase. ۹ 


When the destination node receives the RREQ, it performs a 
sequence of checking processes. It first unscrypts the received 
ciphertext and compare the result with the routing message 
received. If the comparison indicates a match, node D gets the 
initial hash value Но. It would further verify the source node S. 
If the sequence number is greater than the last received 
sequence number from S, it checks the hash chain field is equal ` 


to А 
HIN, НІМ, „НІ. HIN;H] .. II | 
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If any step of the above checking process fails, the 
authentication fails, and the destination node discards the 
RREP packet. It first copies the accumulated node list from the 
RREQ packet, reverses it, and puts it to the source route. 

As is evident from proposed scheme, the format size will be 
increased with inclusion of Hash key generation. The routing 
load will increase due to incorporation of security. It is also 
clear that the scheme affects the packet delivery fraction and 
end-to-end delay. The packet delivery fraction will be 
marginally reduced. Also chances of packets drop may increase 
due to delay produced in route reply case. This could be 
improved by having higher timeouts for packets buffered for 
route discovery. 


о 20 30 100 150 200 250 300 3:0 400 450 500 
Pause ёте 


Graph 1: PDF using pause time 





Simulation study has been carried out to study the performance 
study of proposed protocol. Simulation Environment used for 
this study is NS-2 [20]. Area selected is 1 x 1 KM and 50 
nodes have been taken. Pause time is varied from 0 to 500 sec. 
Pause time 500 means minimum movement and 0 means 
maximum movement. TCP packets are used. 

Graph 1 show the packet delivery ratio based on pause time. 
The packet delivery ratio is the fraction of successfully 
This performance measure determines the completeness and 
correctness of the routing protocol. Pause time of 0 means very 


dota 
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Graph 2: End to cad delay 


Hash Security for Ad hoc Routing 


As the graph indicates ‘Secured’ has less number of packets 


delivered, but this reduction in delivery is due to Hash keys . 


calculations and evaluations. Graph 2 represents the end to end 
delay with respect to pause time. Average end-to-end delay is 
the delay experienced by the succeasfully delivered packets in 
reaching their destinations. More end to end delay is observed 
in this case for ‘Secured’. The reason is again the more 
calculation part involved for hash key estimation. It should be 
noted here that only trusted packets are delivered, so some 
packets does fall because of this reason also. 

The reduction in packet delivery ratio and increase in end to 
end delay does not show the effectiveness of the proposed 
scheme. This change will be obvious as more packets are 
sacrificed to keep them secured. Security is achieved at the cost 
of performance. Efforts are on to reduce the, margins by 
reducing the size of Hash key 


5.0 CONCLUSION 


schemes. An attempt has been made to present an 
overview of the existing security scenario in the Ad-Hoc 
network environment Hash Key management has been 
proposed as one of the best options for security, though other 
options can also be considered depending upon need of 
security. As hash key chain is configured as a recursive chain 
зо these keys are noted ih route table. Important function is that 
the routing protocol functions very similar to the existing one 
when there aro no external attacks. Whenever an attack occurs 
additional packets need to be sent to change the routes 
established by the malicious control packets. This increased 
traffic size will have its impact on overhead. The overhead is 
bound to increase with it, but keeping in view of the better 
secured routing this will have to be done to achieve desired 
results. Efforts are on to simulate the proposed scheme with 


to work better in dense environments as selection of path 
- becomes easy in case of failures, Ad hoc networking is still a 
raw area of research as can be seen with the problems that exist 
in these networks and the emerging solutions. Several protocols 
for secured routing in Ad-hoc networks have been proposed. 
There is a need to make them more secure and robust to adapt 
to the demanding requirements of these networks. The current 
security mechanisms, each defeats one or few routing attacks. It 
is still a challenging task to design routing protocols resistant to 
multiple attacks. 


6.0 FUTURE SCOPE 

More sitesi УШ be Gd nt Unig ee an & fuc 
as well. DSR and TORA will also be campared with proposed 
scheme and implementing this concept into them. Dense 
environment has been used in this scheme. Efforts are on to 
make the scheme robust for sparse medium as well. 
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ABSTRACT 

A mobile ad hoc network (MANET) is a temporary network 
which is formed by a group of wireless mobile devices without 
the aid of any centralized infrastructure. In such environments, 
finding the identity of a mobile device and maintaining the 
paths between any two nodes are challenging tasks, in real 
time the limited propagation range of mobile devices restrict its 
identity only to its neighbors and a new host enters in to a 
MANET does not know the complete details of that 
instantaneous MANET. This paper analyses the possibility of 
content based route discovery and proposes a framework for 
request based route discovery and path maintenance using ant 
agents. The ant agents fetch routing information along with 
content relevancy which will have a major influence on 
pheromone value. The pheromone value is used to find the 
probability of goodness. The proposed framework consists of 
ant structures and algorithms for route discovery and path 
maintenance. 


KEYWORDS 
MANETS, ANT, Request Based Path Setup 


1.0 INTRODUCTION 

A Lot of research work is going on the development of routing 
algorithms for MANETs. The swarm intelligence based routing 
algorithms are Antnet [5], ARA [3] and AntHocNet [4] The 
eategorization will be in general either as proactive or reactive 
routing. Proactive routing algorithm updates routing tables 
constantly but reactiye routing algorithms update routing 
information when required. In path maintenance phase the ants' 
exploratory behavior is limited around the current optimal path. 
The basic design behind ACO algorithms for routing.is the 
consciousness of routing information through path sampling 
using ant agents. These ant agents are generated concurrently 
and independently by the source nodes, with the task to try out 
a path to an assigned destination. Assigned destination is an 
assumption in all existing algorithms ie) the user has to 
mention source and destination addresses manually. In the 
range limited networks there is no standard approach to 
identify the destination node. The general algorithms are 
working as forward ants always attempts to discover newer 
routes and the backward ants update path quality and maintain 
pheromone values. The pheromone value is a measure of 
probability of goodness going over that neighbor on the way to 


the destination. 

In this paper the content based route discovery is proposed, the 
precise number of forward ant generations and implementation 
of heuristic routing methodology are given as algorithms. The 
rest of the paper is organized as follows. In Section II, the 
different types of forward ants and backward ants which ensure 
the content based route discovery and guaranteed data 
transaction are discussed. The timer introduced bere is to 
reduce the number of backward ants in heavily loaded or 
congested path .The XML privileges are taken in to account to 
maintain the consistent forward ant and back ward ant 
functionalities. Next In Section III the proposed algorithms for 
forward anta, in section IV path updating algorithms аге 
discussed and in section V the simulation and results are 
explained. 


2.0 ANT AND HOST PROFILE 

A FORWARD ANT WITH CONTENT TAG 

When a source node needs some information or content from 
an existing MANET, it first checks the cache for existing 
routes, when no routes are known, it broadcasts forward 
request ants with content tag and it is propagated through the 
network till it reaches maximum hop count. The forward ant 
carries the content to be searched, when a relevant content is 
found then forward ant is converted in to backward ant, at the 
same time the forward ant continues its travel for more relevant 
contents till it reaches maximum hop count. À forward ant at 
each intermediate node selects next hop using the information 
stored in the routing table of that node or by rebroadcast. The 
timer attribute is used to find out congested path for load 
balancing. The forward ant initializes the timer value to zero 
and increments the value by milliseconds till it reaches 
destination. When a forward ant finds the relevant content from 
an authenticated node, then backward ant is generated as in 
AntHocNet[4] which takes same path but in opposite direction. 
The backward ant updates pheromone value as it moves on its 
way to source node. The content relevancy and availability 
ratio decides pheromone value, more relevant content increases 
pheromone value. The definition of forward ant is given in 
XML format to acquire all benefits of XML in data delivery. 
XML could be easily combined with DTD [Document Type 
Definition], XML Schema for integrity checking and SOAP, 
XML RPC for accessing remote methods and devices, and also 
it could be benefited from the XML Security. 
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2.1 Schema definition 


<?xml version="1.0"?> 
«xsi:schema 
xmins:xsizi"http://www.w3.org/2001/XMLSchema"» 
«xxl: element name="fwdant"> 
<xsi:complexType> 
<xslsequence> І 
<xsi:elementname= "еа" type-"xsi:integer"/» 
«xsi:element name="req" type- "xsi:integer"/^ 
«xsi:element name- "payload"type- "xsi:integer"/» 
<xsi:element пате "hopcount" type="xsi:integer"/> 
«xsi:element name- "maxhopcount" 
type= "xsi-integer'/» 
«xsi.element name "timer" type-"xsi:integer"/» 
«xsi:element name="srcaddr” type-"xsi:integer"/» 
«xsi-element name= "destaddr" type-"xsi:integer"/» 
<xsi:element name= "content" type-"xsi:string"/» 
x«xsi:element name="“path"> 
«xsi:complexType» 
«xsi sequence» 
«xsi:element name-"n1" 
type= ^xsi-integer"maxOccurs— "unbounded" 
minOccurs- "0^/» 
«/xsi;.sequence» 
«/xsi:path» 
«/xsi:complexType» 
<Assi:element> 
</xsi:sequence> 
</xsi:attribute> 


</xsi:complexType> 
</xsl:element> </csi:schema> 


2.2 Forward ant structure 


The forward ant structure could be combined with XML 
Schema and explored to all its neighbors to discover content 
and new routes. 


<fwdant id = no> 
<fwd> 1</fwd> 
<req>1</req> <payload>0</payload> 


<hopcount> CurrentHopCount</hopcount> 
<maxhopcount>Theory Standards </maxhopcount> 
<timer>00:00:00:00</timer> 
<srcaddr>Address </srcaddr> 
<nexthop>Neighbor Address<nexthop> 
<destaddr>Empty</destaddr> 
<content>Content To be searched</content> 
<path> 
«nI»Neighborl «/n1» 
«n2» Neighbor2«/n2» 
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«nN»NeighborN«/nN» 
</path> 
</fwdant> 


B NODE PROFILE 

It is been assumed that user sets profile for his device which 
participates in MANET. Profile states the nature of content it 
has and defines access rights. A forum could be constituted to 
define profile format If a node wishes to contribute or 
distribute its data then it can tag the content's availability in 
various categories. First one is public content which could be 
accessed by any node in that network, data movement and ant 
movement will not be sensed by the user of that device, in 
other terms the public contents will be delivered without 
human intervention. Second content type is categorized as 
protected data, protected data also will be shown for public 
view, content tag could be identified by any node but content 
delivery has to be authenticated by the user of that target host. 
provide a secured way of protected data transaction. Third is 
categorized as private content, which cannot be accessed by 
other hosts. 


2.3. Node profile 

<node> 
<address>Nodsip</address> 
<public> 


<contenttag>Tagname[ JAVA] 

«filename» Name of the file<filename> 
</contenttag> 
<contenttag>Tagname[C++] 


«filename» Name of the file«/filename» 
</contenttag> 


</public> 
<protected> 


<contenttag>Tagname[Photos] 
«filename» Name of the flle«/filename» 


3.0 ALGORITHMS FOR ROUTE DISCOVERY 

The basic structure is taken from ARA [3], the attributes like 
timer value; content tag and content relevancy are updated. The 
standard stack structure to hold the path information is changed 
as path variable since XML format is used for forward ants. 
The XML schema is used to check integrity. So the corrupted 
forward ants could be discarded. 

Route discovery is the process of finding possible paths 
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The result of route discovery process is the generation n of 
Se ee аы eens сы reer Gi 








if 25, —8,4,, then 
if астар € nctaglist && n.,—public |l protected then 
Sag -updateandgenezate(a,. Quos Opa.) 


Converttobackwardant( fay) 


ЕВ 


Routediscovery(f au Mig) 


end 
else т . 
discard ы) 
end 
Generation of backward ant is a simple process of changing the 
addresses and setting proper timer values.Backward ant 
relevancy is filled with the help of ranking algorithms. 
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3: Generation of backward ant 


4.0 PATH UPDATION 

4.1 BACKWARD ANT FROM DESTINATION 

The backward ants are оѕеб р update the discovered path. 
Existing algorithms states backward ants calculates 
pheromone values using queuing delay and MAC delay as in 
AntHocNet[4] in turn pheromone values are used to calculate 
the probability of goodness. The content relevancy is also 
included to calculate pheromone value. New pheromone value 
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will be calculated as 


Ind = a * Ind + (1-0) * Гы *Cr (1). 


Гы ~ New Pheromone value for the neighbor. 


а — 0.7 taken from standards[good probability] 
Cr- Content Relevancy. ` 





Content Relevancy ranges from [0,1] which will have direct 
impact to pheromone value. This pheromone valne is updated 
to the node table of every node by beckward ants. 

The backward ant travels back to source by following path 
information and updates pheromone value which also includes 
content relevancy ratio, content relevancy will have a major 
impact to retrieve relevant content. The timer value will be 
trailed automatically to discard longer waiting beckward ants. 
If a backward ant is unable to reach the source on time then it 
clearly indicates that the path is more congested and data 
delivery will not be fruitful. So better the backward ant could 
be discarded in the intermediate node itself so that path could 
be avoided for data delivery. 


Algorithm 5: unicast(b,.) 
Input : bz backward ant 


Output: m:updated path Б 
II find content relevancy and assign it to Cr. 

b am AFO, b a 0,70, b. ue 1,0 bant.acrel-Cr 

if Daw dq, —currentNodelP then 


Гы = а*Гы + (1-а )* Гы *С, 
start new unicast request from bes Ger 
to bant.adst 
end 
else if bay adn! =currentNodelP then 
if Б... аъ. bau Gmie &® 
Db A, > 00:00:00:00 then 


pickup next node from bas pad, and 


autodecrement Dow Gamer 3 
// update pheromone value based on content 
relevancy 


else 
discard(b eu) 

end 

end 


The free function will free up memory space which is allocated 
to a particular ant. A node can delete a forward ant which 
crossed maximum hop count and time exhausted backward 
ants. The discard algorithm is used to control flooded forward 
and backward ants where ever is possible to reduce the 
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Input : foe: forward ant or Dey: backward ant 


Output : null 
хее); 


4.2 WORKING ANTS 

Once the content discovery and path establishment is over then 
the data transaction thread has to be started. The data 
transaction thread follows a sequence, first sending a unicast 
request from source to destination to confinm data delivery, the 
destination node starts data delivery after receiving 
confirmation, the payload is accompanied with a backward ant 








Figure 4.3: Backward ant with payload 





5.0 SIMULATION AND RESULTS ` 
The simulation is tried with 60 nodes moving at random way 
Contents and content relevancies are distributed randomly 
among all nodes. Simulation is executed for random times 
ranges from 20 to 200 seconds and requests are made for 
different contents from different nodes. The proposed: 
algorithm is compared with ARA [3] and it outperforms in 
searching relevant content in a short period. 
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Figure 5.1: Content relevancy analysis 
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ASHRAAM gathers more relevant results while searching 
content in MANET. ARA only chooses the minimum hop 
count destination to retrieve contents which may be irrelevant. 
So it has to perform route discovery process again and again 
which will create more congestion and flooded ants in 
MANET. ASHRAAM initiates only one route discovery 
process; to get all destinations and content relevancies for the 
content. 


6.0 CONCLUSION 

In this study a new proposal for content based routing using ant 
agents has been made as a framework. The proposed 
framework can reduce the congestion in MANET, and also it 
can diminish the number of retransmissions of forward and 
backward ants. The timer concept decides the life time of a 
backward ant, a backward ant which is trapped by a heavily 
congested network will be dropped automatically after a period 
of time. The long waiting backward ants brings out unreliable 
paths to the source node, with the help of timer concept the 
problem is completely avoided. This framework will create a 
new path towards the content based route discovery and XML 
based ant generations to leverage the benefits of both. 
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ABSTRACT 

The present work considers a fleet of fishing trolleys. The 
MANEMO is the integration of mobile ad-hoc network 
technology and mobile network technology for maintaining the 
uninterrupted connectivity among the fishing trolleys in deep 
sea. It provides local connectivity among the fishing trolleys for 
offshore help using mobile ad-hoc network and global 
connectivity among the fishing trolleys for onshore help using 
mobile network. Each fishing trolley works as a separate node 
in case of local communication whereas all the fishing trolleys 
form a mobile network in case of global communication. The 
routing algorithm in MANET environment selects an optimal 
route for a session before starting transmission of data packets 
associated with that session. It uses route maintenance 
algorithm to detect whether a trolley associated with an 
existing route is going out of the communication range during 
the ongoing session in advance before the existing route fails 
completely. Such consideration helps to reduce the data packet 
loss. If the local communication among trolleys fails due to the 
change in network topology the rest of the communication can 
be maintained globally which helps to provide the 
uninterrupted connectivity to the fishermen in the deep sea. The 
performance of the proposed scheme is evaluated on the basis 
of initial path set up time and average packet delay. 


KEYWORDS 
MANET, MObile NEtwork, Basic POSANT routing са, 
WLAN, WiFi/WiMAX 


1.0 INTRODUCTION 

In today’s society many people spend a lot of time in vehicle. 
They need network connectivity for various safety and non- 
safety applications. The MObile NEtwork technology (NEMO) 
and Mobile Ad-hoc NETwork (MANET) technology are 
integrated in MANEMO to maintain global and local 
connectivity among vehicles. The vehicles can communicate 
using MANET when they are close enough. They can 
communicate using NEMO for getting help like weather 
forecast, accidents, and attack from intruder etc. They also use 
NEMO for communication in case the MANET technology 
fails to maintain local communication due to the change in 
network topology. 

Several such i ion schemes have been reported so far. 
The mobile routers (MRS) [1] deployed in car not only provide 
external communication access but also manage the mobility of 


the whole network transparently. In [2] the MANET routing 
protocol is used to achieve multi hop communication between a 
MANET node and an attachment point in case the attachment 
point is within the coverage area of MANET. The multi hop 
path between a MANET node and an attachment point is 
established through NEMO in case the attachment point is out 
of the coverage area of MANET. In this scheme NEMO 
environment provides infrastructure connectivity whereas the 
MANET environment deals with routing issues internally to a 
mobile network. The authors did not present any simulation 
results. À vehicular network integration of VANET with 
NEMO is proposed in [3]. In this scheme the receive on most 
stable group path and link expiration time threshold are used to 
find the most stable link in the VANET environment. But the 
proposed VANET routing is unable to offer best throughput. 
The in vehicle router system to support network mobility is 
proposed in [4]. This scheme is the combination of Mobile 
IPv6, Interface switching and Prefix Scope Binding update to 
achieve end to end permanent connectivity and migration 


system (GPS) to view the location of vehicles in Google map. 
The integration of NEMO and MANET is proposed in [6] to 
form rescue team communication and the experiment is 
conducted over a mountain rescue team. Tsukada et al 
described the co-operation between MANET and NEMO [7] to 
support route optimization and multi homing. The authors used 
the optimized link-state routing (OLSR) algorithm for the 
MANET environment. But the transmission of control packets 
is required for route existence verification if the data packets 
are transmitted at a slow speed which increases the overhead of 
the OLSR algorithm. They mainly focused in defining the 
architecture and the purpose of integration. The authors did not 
consider the detailed usage of combination and their utility 
experimentally. 

The present work considers a fleet of fishing trolleys as 
vehicle. All the fishing trolleys belong to the same fishing 
harbor, which is their home network associated with a Home 
Agent (HA). A MR is associated with each trolley. Each MR 
works as an individual node in case of local communication 
using MANET interface. All MRs form a mobile network 
which is connected with the HA through Internet. They 
communicate globally with HA through Internet using NEMO 
interface and MR-HA tunnel. One of the trolleys works as a 
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special fixed node (SFN) for MANET and as a local fixed node 
(LEN) for NEMO. It maintains the optimal route (OR) 
information for both MANET and NEMO. It is not taking any 
part in communication. The access router (AR) installed in the 
island forms a foreign network for the fishing trolleys and is 
considered as Island Side Unit (ISU).The AR installed in the 
shore forms a foreign network for the fishing trolleys and is 
. considered as Shore Side Unit (SSU). The ISU maintains 
connectivity between the fishing trolleys and the HA through 
Internet in case the fishing trolleys are closer to the Island. The 
SSU maintains connectivity between the fishing trolleys and 
the HA through Internet in case the fishing trolleys are closer to 
the shore. 

The WLAN is preferred for MANET due to its leeser 
communication range and power consumption [8] whereas 
WiFi/WiMAX is preferred for NEMO due to its higher 
communication range and power consumption [8]. The cost 
and power consumption of maintaining local communication 
among trolleys is reduced by using WLAN for MANET in the 
present work. Moreover such communication among trolleys is 
secured as it does not need Internet acceas. So the 
communication among trolleys is maintained locally in most of 
the cases. But tbe route failure may occur in MANET during 
local communication among trolleys due to the change in 
network topology. In such a situation the rest of the 
communication among trolleys can be maintained through HA 
using MR-HA tunnel if it is not possible to select an alternative 
route in MANET. So the integration of MANET technology 
and NEMO technology in the present work helps to maintain 
the uninterrupted connectivity among the fishermen in the deep 
sea. 


2.0 ROUTING ALGORITHMS FOR MANET 

Two different routing algorithms for, MANET are proposed in 
the present work. The algorithms are considered for discussion 
in the following sections. 


2.1 HA POSANT ROUTING ALGORITHM 

The HA is equipped with Google Map [9] and each trolley is 
equipped with GPS. A source node (S. id) sends route request 
message (RRM) (as discussed in section 2.1.1) to HA for the 
initiation of a session with a destination node (D. id). The HA 
triggers route selection algorithm (as discussed in section 2.1.2) 
to select an OR in response to RRM and sends the OR to 5 id 
using route f message (КЕМ) (as discussed in section 
2.1.1). The HA assigns a unique session identification (55 id) 
to cach session after selecting an OR for it. After receiving 
КЕМ, S. id generates Type 0 packet (TO as discussed in section 
2.1.3). The Route field of TO contains the identification of all 
the nodes which are associated with OR as mentioned in the 
Route field of RFM by HA. S id sends this packet to D. id 
throug. all the nodes which are identified in the Route field of 
TO. Each node maintains a routing table (RT) (as discussed in 
section 2.1.4) and inserts a record in RT after receiving TO. 
Both S id and D id associated with a particular session 
generate Type 1 packet (T1 as discussed in section 2.1.3) and 
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send this packet to cach other to maintain the bidirectional 
transmission of packets corresponding to a particular session 
among them using OR which is mentioned in RFM. The HA 
maintains a session table (ST) (as discussed in section 2.1.5) to 
store the information of all the ongoing sessions among nodes 
in MANET. The HA inserts a record in ST after selecting an 
OR. As soon as an ongoing session is over 5 id associated with 
this session sends session over message (SOM) (as discussed in 
section 2.1.1) to HA. The HA searches ST for the record who's 
SS id attribute matches with the SS id field as mentioned in 
SOM and deletes that record from ST. The HA executes tbe 
route maintenance algorithm (as discussed in section 2.1.6) to 
detect node(s) which is associated with an existing route(s) and 
is going out of the communication range from its neighboring 
node associated with the same route during the ongoing 
session. In such a case the HA considers the existing route(s) as 
faulty and executes route selection algorithm for the selection 
of an alternate OR(s) to replace the faulty existing routes). It 
sends the alternative OR to S_id(s) using route maintenance 
message (RMM) (as discussed in section 2.1.1). After receiving 
RMM, S id(s) generates Type 2 packet (T2 as discussed in 
section 2.1.3). The N Route field of Т2 oontains the 
identification of all the nodes which are associated with the 
alternative OR as mentioned in the N Route field of RMM by 
the HA. S id sends T2 to D id through all the nodes which are 
identified in the N Route field of T2 for necessary insertion or 
modification in their RT. 


2.1.1 MESSAGE EXCHANGE AMONG VARIOUS 
NODES 

RRM contains S_id and D_id fields. RFM contains S_id, D_id, 
SS_id and Route fields. SOM has SS_id and F_flag fields. The 
F_flag field of SOM is set to indicate the end of session which 
is identified in its SS id field. RMM has SS id, S id and 
N. Route fields. 


2.1.2 ROUTE SELECTION ALGORITHM 

The GPS detects the current location in terms of longitude and 
latitude of eech node. The GPS sends this information of each 
node to HÀ as soon as the current location of any node 
changes. The HA uses Vincenty's inverse equation [10] to 
calculate the distance between two neighboring nodes from 


_ their current location which is provided by GPS. The longitude 


and latitude of the fishing area is provided by the fishing 
authority to HA. The Google Map in HA shows the real time 
image of each node within the fishing area using the 
information provided by the GPS and the information provided 
by the fishing authority. The HA maintains a graph of nodes 
using their real time image which is provided by the Google 
Map continuously and creates a rectangular boundary around 
the graph of nodes. If any intruder node crosses the rectangular 
boundary from outside HA sends a special security signal to the 
node(s) closer to the intruder node. After receiving RRM the 
HA applies depth first search to the graph and finds all possible 
routes from S_id to D_id. The HA counts the number of nodes 
in each possible route and selects the route having minimum 
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number of nodes as the best route. The HA uses basic 
POS иршшк Reet ee о ШШЕ 
best routes. 


2.1.3 TYPE OF PACKETS 

TO contains SS id, S id, D id, Type and Route fields. T1 
contains 55 id, Node id, S No, Type and PAYLOAD fields. 
The Node, id field is 5 id in case the packet is generated by 
S. id and D jd in case the packet is generated by D id. The 
S. No field indicates the sequence number of the packet. The 
PAYLOAD field contains the data corresponding to the session 
which is identified by SS, id. T2 has SS id, S id, D id, S No, 
Type, N. Route and PAYLOAD fields. The Type field in TO, 
"ТІ and T2 indicates their type. 


2.1.4 ROUTING TABLE (RT) 
Each record in RT has 5 attributes as shown in TABLE: 1. 


س و ا 


Table 1 


Let TABLE-1 is RT which is maintained by j* node and it 
shows a record for s session. S_id and D_id which are 
associated with the s* session are identified as S and D 
respectively in TABLE-1. T indicates the next hop of the j* 
node in case of transmission from S to D and E indicates the 
next hop of the j^ node in case of transmission from D to S in 
TABLE-1. After receiving TO the j“ node inserts a record in 
TABLE-1. After receiving T1 the j^ node searches TABLE-1 
for the existing record whose SS, id attribute matches with the 
SS id field as mentioned in Tl. Then it compares the S id 
attribute and the D id attribute of the existing record with the 
Node, id field аз mentioned in T1. If the Node. id field in T1 
matches with the S id attribute of the existing record the j“ 
node forwards the packet to T and if the Node_id field in T1 
matches with the D id attribute of the existing record the j^ 
node forwards the pecket to E. After receiving T2 the j* node 
searches RT for the existing record whose SS_id attribute 
matches with the SS_id field as mentioned in T2. If found it 
updates the record by replacing the old route attribute by the 
new route attribute as mentioned in T2. Otherwise, it inserts a 
new record in RT. When a node is not participating in packet 
transmission corresponding to a particular session, it deletes the 
corresponding record from RT. . 

2.1.5 SESSION TABLE (ST) 


Each record in ST has 3 attributes as shown in TABLE-2. The 
co pea ыраа ао н ie] 
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2.1.6 ROUTE MAINTENANCE ALGORITHM 

The HA computes the distance between the two neighboring 
nodes continuously using the information provided by GPS and 
using the Vincenty’s inverse equation. The HA considers a 
node as MOVE_NODE im case its distance from the 
neighboring node crosses a threshold. The threshold distance is 
computed during simulation as discussed in section 4.1.2. As 
soon as HA detects such a node, it searches the Route attribute 
of all the records in ST. It selects the record(s) whose Route 
attribute contains the identification of the MOVE_NODE. If 
found it retrieves the selected record(s). It executes route 
selection algorithm for the selection of an alternative OR(s) 
before the existing route(s) fails completely. Such advance 
selection of an alternative route helps to reduce packet loss of a 
session. The HA updates the selected record(s) by replacing the 
old route attribute by the new route attribute in ST. 

The installation of Google Map along with GPS increases the 
cost of the system. Moreover the GPS may not be able to work 
properly in situations such as underwater conditions e.g. within 
submarines. In such a situation radio detection and ranging 
(RADAR) works well. The RADAR POSANT routing 
algorithm is considered for discussion in section 2.2. 


2.2 RADAR POSANT ROUTING ALGORITHM 

Each node is equipped with two antennas, one at the front end 
and one at the rear end of the node. Both the antenna can work 
as transmitter as well as receiver to achieve bidirectional 
transmission of packets corresponding to a particular session. 
S. id triggers route selection algorithm (as discussed in section 
2.2.1) by forwarding ant packet towards D_id for the initiation 
of a session as in basic POSANT routing algorithm. D_id 
selects an OR and sends it to SEN (SEN, id) using D to. SEN 
mersage (as discussed in section 2.2.2). The SEN sends OR to 
S id using SEN to S message (as discussed in section 2.2.2). 
The SEN assigns a unique SS id to each session after receiving 
OR from D id. After receiving SEN to S message S id 
generates TO (as discussed in section 2.1.3). The Route field of 
this packet contains the identification of all the nodes which are 
associated with OR as mentioned in the Route field of 
SEN (о S message by SEN. S. id sends TO to D. id through all 
the nodes which are identified in the Route field of TO. Each 
node maintains RT (as discussed in section 2.1.4) and inserts a 
record in RT after receiving a TO. Both S id and D id 
associated with a session generate Т1 (as discussed 
in section 2.1.3) and send T1 to each other to maintain the 
bidirectional transmission of packets corresponding to a 
particular seasion among them using OR as mentioned in 
SEN to, S message. The SEN maintains a ST (as discussed in 
section 2.1.5) to store the information of all the ongoing 
sessions among nodes in MANET. The SEN inserts a record in 
ST after receiving D to SEN message. As soon as an ongoing 
session is over S id associated with this session sends SOM (as 
discussed in section 2.1.1) to SEN. The SFN searches ST for 
the record who's SS id attribute matches with the SS id field 
as mentioned in SOM and deletes that record from ST. Each 
node associated with an existing route executes route 
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maintenance algorithm (as discussed in section 2.2.3) to detect 
whether its neighboring node associated with the same route is 
going out of the commmmication range during the ongoing 
session and sends an alarming signal to the neighboring node. 
In response tbe neighboring node sends its identification to 
SEN. In such a case the SEN considers the existing route as 
faulty and sends SFN. ALT ROUTE messago (as discussed in 
sectión 2.2.2) to 5 id which is associated with the faulty route 
for the execution of the route selection algorithm. S id 
executes route selection algorithm for the selection of an 
alternative OR to replace the faulty existing route. After 

the alternative OR S, id generates T2 (as discussed in 
section 2.1.3). The N Route field of T2 contains the 
identification of all the nodes which are associated with the 
alternative OR as selected by S id. S id sends T2 to D id 
through all the nodes which are identified in the N, Route field 
for necessary insertion or modification in their RT. 


2.2.1 ROUTE SELECTION ALGORITHM 

S. id forwards the ant packet through all the possible routes 
between S, id and D. id associated with a particular session as 
in basic POSANT routing algorithm. The ant packet deposits 
pheromone value to each link. The maximum pheromone value 
is deposited to the link having smallest length. The ant packet 


has 6 fields as shown in Fig.1. 
[su | ри | ar | rs | Rome |G 


Figure 1: Format of ant packet 


The A_F field is set to indicate the type of the packet as ant. 
Let i* node receives an ant packet from k node and j“ node is 
the successor of the i* node. The i* node mentions the current 
time stamp in the T_S field of the ant packet and forwards it to 
the j* node. The i? node adds its identification in the Route 
field of the ant packet. The i* node computes the difference in 
time stamp (Diff time) between the current time stamp 
corresponding to the time of receiving the ant packet by it and 
the time stamp in the T S field of the ant packet as mentioned 
v api ooa перца alio O eee 
node (Dg) by multiplying Diff time and the speed of 
electromagnetic signal {mt/sec}(as packets constitute of digital 
bits and are sent using electromagnetic signals). The bit error 
rate increases rapidly when the distance between the two 
neighboring nodes in the WLAN environment is greater than 
45 meters [12]. So in the present work the pheromone value of 
the link between the i? node and the k™ node (P. value,) is 
assumed as 20 if Du<45 otherwise it is assumed as 1. The i? 
node also multiplies the value in the P. C field of the ant packet 
as mentioned by the КЁ node by P. values. At i? node the value 
in the P C field of the ant packet indicates the pheromone 
concentration of the route from S. id up to the i? node. 
The D id receives multiple ant packets through all possible 
routes between 9 id and D id. It compares the P C value of all 
the received ant packets. The route field in the ant packet 
having maximum P C value is selected as OR. 
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222 MESSAGE EXCHANGE AMONG 
NODES 
D to SEN message contains 5 id, SEN id and Route fields. 


SEN to S message contains SS id, S id, SFN id and Route 
fields. SFN ALT ROUTE message has S id, SEN id and 
SS id fields. 


VARIOUS 


2.2.3 ROUTE MAINTENANCE ALGORITHM 

Each node associated with an existing route computes its 
distance from the neighboring node which is associated with 
the same route using momo-static equation [13]. The mono- 
static equation used by the RADAR antennas in this scheme is 
as follows: 

Р, = 10 logi (PGGA oy (Ax R^) 

= 10 logiol PGG.((oc^ (4x). Ё R5)] 

where, Р, = Received peak power, P, = Transmitted peak 
power, G, = Gain of transmitter antenna (dBi), G, = Gain of 


receiver antenna (dBi), A = Transmitted wavelength (m, cm, in, ` 


etc.), o = Radar cross-section of target - RCS (m?, cm’, in?, 
etc.) R = Range (m, cm, in, etc), c = speed of light. The 
parameter values of the mono-static equation are assumed as 
follows: P, = 20 dbm, G, = G, =16 dbi, A = 15 cm, o = 2.5 ш“ 
= 3* 10° meter/sec. The parameter R indicates the 
distance between the two neighboring nodes. P, is measured at 
the receiving antenna and R is computed using the mono-static 
equation using the known value of all the other parameters. 
Each node associated with an existing route also compates its 
angle with the neighboring node which is associated with the 
same route using Pythagoras theorem. In AABC (Fig.2) the 
vertex B and the vertex C represent the location of the front end 
and rare end antenna in a node. The vertex A represents the 
location of the neighboring node. In ААВС the side AB (=c) 
front end antenna. P, is measured at the front end antenna and c 
is computed using mono-static equation. The side AC (=b) 
represents the distance between the neighboring node and the 
rare end antenna. P, is measured at the rare end antenna and b is 
computed using mono-static equation. The side BC (=a) 
represents the length of the node (trolley). AP (-h) is 
perpendicular to BC. _ | 








B P a € 
Figure 2: Triangular representation of the angle calculation 
process. 


The between the two neighboring nodes (angle C) is C= 
cos  ((a^--b?-c!y/ 2ab} using Pythagoras theorem. A node sends 
an alarming signal to its neighboring node 
(RECEIVED. NODE) in the direction of the angle as computed 
by the Pythagoras theorem in case its distance from the 
RECEIVED_NODE crosses a threshold. The threshold 
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distance is computed during simulation as discussed in section 
4.1.2. The RECEIVED NODE sends its Node id to SEN. The 
SEN searches the Route attribute of all the records in ST. It 
selects the record(s) whose Route attribute contains the 
identification of the RECEIVED, NODE. If found it retrieves 
the selected record(s) and sends SEN. ALT ROUTE message 
to S_id(s) associated with the selected record(s) to execute 
route selection algorithm for the selection of an alternative 
OR(s) before the existing route(s) fails completely. Such 
advance selection of an alternate route helps to reduce packet 
loss of a session. 5 id forwards the ant packet towards D id. 
D. id selects an alternative OR and sends it to SFN. The SEN 
sends the alternative OR to S. id. The SFN updates the selected 
record(s) by replacing the old route attribute by the new route 
attribute in ST. 


233 COMPARISON OF ROUTING ALGORITHMS 

The performance of ANTNET, GPSR, ANTHOCNET and 
basic POSANT routing algorithms are compared on the basis 
of delivery rate, convergence time and algorithm overhead in 
[11]. In this section the basic POSANT routing algorithm [11], 
HA POSANT routing algorithm and RADAR POSANT 
routing algorithm are compared on the besis of storage 
requirement, RT searching time and time complexity of the 
algorithm. 


2.3.1 STORAGE REQUIREMENT 
In basic POSANT routing algorithm each node maintains a 
forward RT to send packets from 5 id to D. id and a backward 
RT to send packets from D. id to 5 id. Bach record in RT has 3 
attributes as shown in TABLE-3. Let TABLE-3 is the forward 
RT at j^ node. The Node, Address attribute is the address of 
D. id in case of forward RT. The Next Hop attribute is the 
address of the next hop node from j“ node towards destination 
which is identified by the Node Address attribute. The 
Pheromone Value attribute indicates the pheromone value 
corresponding to the next hop node which is indicated by the 
Next Hop attribute. The Node Address attribute and the 
Next Hop attribute are 128 bit IPv6 address. The maximum 
pheromone value which is deposited to a link is 20 as discussed 
in the section 2.2.1 and the number of bits require to represent 
the maximum pheromone value is 5. So the length of each 
record in tbe forward RT at any node is 261 bits. The number 
of records in the forward RT at j^ node for a single session 
depends upon the number of possible next hop nodes from j* 
forward RT is (261*number of possible next hop towards 
D id) bits. 


` Node, Address Next, Hop Pheromone Value 
128 bits 128 bits 5 bits 
iit Eee | 





Table 3 


Let TABLE-3 is the backward RT at j^ node. The 
Node Address attribute is the address of S id in case of 
backward RT. The Next, Hop attribute is the address of the 
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next hop node from }® node towards source which is identified 
by the Node, Address attribute. Tho number of records in the 


backward RT at jû node for a single session depends upon the ` 


number of possible next hop nodes from j“ node towards 
source. The storage requirement per backward RT is 
(261*number of possible next hop towards 5 10) bits. So the 
storage for each bidirectional session is 
261*(number of possible next bop towards destination + 
number of possible next hop towards source) bits. 

In HA POSANT routing algorithm and RADAR POSANT 
routing algorithm each node maintains a single RT as shown in 
TABLE-1. The S_id, D_id, SN_NH and DN_NH are 128 bits 
IPv6 addresses. Now for 1000 number of different bidirectional 
sessions the number of bits requires to represent SS_id is 10. 
So the length of each record in RT is 522 bits. There is a single 
record for each bidirectional session in RT and so the storage 
requirement for each bidirectional session is 522 bits. The 
storage requirement for each bidirectional session in basic 
POSANT routing algorithm is greater than the storage 
requirement in HA POSANT routing algorithm and RADAR 
POSANT routing algorithm if the number of next hop nodes 
from j* node towards S_id or D_id is greater than unity in 
TABLE-3. 


2.32 RT SEARCHING TIME 

Let in case of basic POSANT routing algorithm the number of 
forward ongoing session through j* node as an intermediate 
node is m and the number of next hop from j^ node towards 
D. id is n. So at j node the forward RT contains m*n number 
of records and the time complexity to select the desired record 
from the forward RT is O(logom*n). The j node compares the 
pheromone value of all the n number of next hops and selects 
the optimal next hop having the maximum pheromone value. 
The link between j^ node and the selected optimal next hop is 
considered as the optimal outgoing link towards D id. The time 
complexity to select the optimal outgoing link from the forward 
RT at j node is O(n”). So the total time complexity at j? node 
for the selection of an optimal outgoing link is O(log,m*n+n’). 
In case of HA POSANT routing algorithm and RADAR 
POSANT routing algorithm RT at j? node contains m number 
of records and the time complexity to select the desired record 
from RT is O(logzm). 

So tbe time complexity of searching RT is higher in basic 
POSANT routing algorithm than in HA POSANT routing 
algorithm and RADAR POSANT routing algorithm. 


2.3.3 TIME COMPLEXITY OF THE ALGORITHM 

In case of basic POSANT routing algorithm RT at each node 
contains the possible next hop and their pheromone value. 
During the ongoing session RT at each node is searched for the 
selection of an optimal outgoing link. In case of HA POSANT 
routing algorithm and RADAR POSANT routing algorithm RT 
at each node contains OR. During the ongoing session RT at 
each node is searched for OR. So OR is selected during the 
ongoing session in basic POSANT routing algorithm which 
increases its time complexity than the HA POSANT routing 
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complexity of the HA POSANT routing algorithm is higher 
due to the time complexity of the depth first search than the 
time complexity of the RADAR POSANT routing algorithm 


„ 3.0 ROUTING ALGORITHM FOR МЕМО. 
The LEN inside the mobile network uses route optimization 
algorithm [14] for the selection of an OR to maintain global 
'communication among trolleys in NEMO. 


phases. The performance of the basic POSANT [11] routing 
POSANT routing algorithm are compared in Phase 1l. The 
performance of MANEMO has been studied in Phase 2. The 
simulation experiment is conduced for 1280 number of packets 
and 6 numbers of trolleys in both the phases. The MANET in 
the proposed scheme is the combination of some 
S. id and intermediate nodes associated with OR in case of HA 
POSANT routing algorithm. The processing units are SFN, 
S_id, D_id and intermediate nodes associated with OR in case 
of RADAR POSANT routing algorithm. The NEMO in the 
proposed scheme is the combination of some interconnected 
processing units such as MNN, LEN apnd MR. Each processing? 
unit in MANEMO is treated as thread and the MANEMO is 
considered as a producer-consumer problem in a large scale. In 
HA POSANT routing algorithm the send request thread at S. id 
sends RRM to HA. The receive request thread at HA searches 
for RRM. If found it selects OR and sends RFM. The receive 
route thread at S_id searches for RFM. If found the forward 
packet thread at S_id forwards packet to the ingress interface of 
its associated MR. The transfer packet thread at cach node 
transfers the packet from the ingress interface to the egress 
interface of the associated MR. In RADAR POSANT routing 


algorithm the source request thread at S. id forwards ant packet. 


towards D id. D. id selects OR and sends D to SEN message 
using route send thread. The SFN send thread at SEN searches 
for D to SEN message. If found it sends SEN to S message. 
The receive route thread at S id searches for SEN to S 
message. If found the forward packet thread at 5 id forwards 
packet to the ingress interface of its associated MR. The 
transfer packet thread at cach node transfers the packet from 
the ingress interface to the egress interface of the associated 
MR. The processing units and the corresponding threads in 
NEMO are discussed in [14]. 


4.1 EXPERIMENTAL RESULTS FOR PHASE 1 

The simulation" experiment is conducted to compare the 
performance of the three routing algorithms for MANET. 
4.1.1 INITIAL PATH SET UP TIME 

It is the time to set up an OR for the initiation of a session. 


Fig.3 shows the plot of initial path set up time for all the three 
routing algorithms. The basic POSANT routing algorithm 
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needs the transmission of forward ant packets and backward 
ant packets for route selection. The HA POSANT routing 
algorithm needs the transmission of RRM and КЕМ among 
nodes for route selection instead of the transmission of forward 
path set up time of HA POSANT routing algorithm than basic 
POSANT routing algorithm. The RADAR POSANT routing 
algorithm needs the transmission of forward ant packets for 
initial route selection which increases the initial path set up 
time of RADAR POSANT routing algorithm than HA 
POSANTrouting algorithm. But the transmission of backward 
ant packets is not required in RADAR POSANT routing 
algorithm which reduces the initial path set up time of RADAR 
POSANT routing algorithm than basic POSANT routing 
algorithm. It can be observed from Fig.3 that the initial path set 
up time of basic POSANT routing algorithm is higher and of 
HA POSANT routing algorithm is lesser. The initial path set up 
time of RADAR POSANT routing algorithm is higher than HA 
POSANT routing algorithm but lesser than basic POSANT 
routing algorithm. 


4.12 AVERAGE PACKET DELAY 

Fig.4 shows the plot of average packet delay vs. simulation 
time for all the three routing algorithms. It can be observed 
from Fig.4 that the average packet delay is higher in basic 
POSANT routing algorithm as it selects OR during the ongoing 
session than the other two routing algorithms. 

Fig.5 shows the plot of average packet delay vs. the number of 
packets received for all the three routing algorithms. The speed 
of the node is assumed as 6 km/hr. If a node associated with 
OR of a particular session starts to move in the opposite 
direction of another node associated with the same route, their 
relative velocity becomes 12 km/hr. The communication range 
of WLAN is assumed as 100 m. So the failure occurs in the 
with the same route go out of the communication range with 
relative velocity 12 km/hr after 30 sec. It can be observed from 


' Fig.3 that the initial path set up time for HA POSANT.routing 


algorithm is 120 msec and for RADAR POSANT routing 
algorithm is 150 msec. The two neighbouring nodes having 
relative velocity 12 km/hr covers a distance of 0.4 m (= 1 m) in 
120 msec for HA POSANT routing algorithm and .5 m (51 m) 
in 150 msec for RADAR POSANT routing algorithm. So the 
packet loss and average packet delay of an ongoing session can 
be minimized by triggering the route maintenance ithm in 


advance when the two neighbouring nodes associated with the ` 


same OR are at a threshold distance of 99 m (100 m-1 m) from 
each other. During simulation it has been observed that the 
time requires to transmit a single packet using basic POSANT 
routing algorithm is 40 msec whereas the time requires for 
transmitting a single packet using HA POSANT routing 
algorithm and RADAR POSANT routing algorithm is 30 msec. 
So the number of packets that can be tranamitted using basic 
POSANT routing algorithm in 30 sec is 700 whereas the 
number of packets that can be transmitted using HA POSANT 
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routing algorithm and RADAR POSANT routing algorithm in 
30 sec is 950 before the failure occurs in the existing route. 


H~ 


PC Mh CAT LM. 
MOC ANT 





гюч каз эй Up эши» 
PYZIZRIPEEZT 








2000 эи) 4000 
Barnuleibon Thee For Packet Trecefiicimeec) 


Figure 4: Average packet delay vs. Simulation time 
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Figure 5: Average packet delay vs. Number of packets 
received 


It can be observed from Fig.5 that the initial average packet 
delay is higher in basic POSANT routing algorithm due to its 
higher inftial path set up time as discussed in section 4.1.1 than 
the other two routing algorithms. The new route is selected in 
basic POSANT routing algorithm after the transmission of 700 
packets. The new route is selected in HA POSANT routing 
algorithm and RADAR POSANT routing algorithm after the 
transmission of 950 packets. The average packet delay in the 
new route for basic POSANT routing algorithm is also higher 
due to its higher initial path set up time than the other two 
routing algorithms. 


4.13 PERCENTAGE OF SUCCESSFULLY DELIVERED 
PACKETS 

TABLE-4 shows the percentage of successfully delivered 
packets for the 3 routing algorithms. The new route discovery 
process starts after the failure occurs in the existing route in 
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generated during the time interval between the occurrence of 


route failure and finding out a new route are lost. The route , 


selects an alternative OR in advance 
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4.2 EXPERIMENTAL RESULTS FOR PHASE 
The simulation experiment is conducted to find the path set up 
time and average packet delay in MANEMO. 


42.1 PATH SET UP TIME 

Fig.6 shows the path set up time in MANEMO. When a S_id 
wants to initiate a session with D id, MANEMO searches 
MANET for the selection of an OR. If found, OR is selected 
otherwise it searches NEMO for the selection of an OR. If the 
selected OR in MANET fails due to the change in network 
topology, MANEMO searches MANET again for the selection 
of an alternative OR. If found the alternative OR is used to 
maintam the rest of the communication. Otherwise NEMO is 
searched for the selection of the alternative OR. It can be 
observed from Fig.6 that the path set up time in NEMO is 
higher due to the Internet acceas overhead than the path set up 
time in MANET. 


422 AVERAGE PACKET DELAY 

Fig.7 shows the plot of average packet delay vs. the number of 
packets received in MANEMO. The maximum number of 
packets that can be transmitted using HA POSANT routing 


MANET is 950 as discussed in section 4.1.2. In the worst case 
no alternative route is found in MANET and the rest of the 
packets are transmitted using the alternative route in NEMO. It 
can be observed from Fig.6 that the initial path set up time in 
MANET is 150 msec and in NEMO is 191 msec. So the total 
time required to set up an alternate route is 341 msec. The two 
neighboring nodes having ‘relative velocity 12 km/hr as 
discussed in section 4.1.2.covers a distance of 1 m in 341 msec. 
So the packet loss and average packet delay of an ongoing 
session can be minimized by triggering the route maintenance 
algorithm in advance when the two neighboring node 
associated with the same OR are at a threshold distance of 99 m 
from each other. But at heavy load the initial path set up time in 
both MANET and NEMO increases which needs a reduction in 
threshold distance to minimize packet loss and average delay in 
communication. So the threshold distance is assumed as 95 m 
during simulation. It can be observed from Fig.7 that the initial 
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average packet delay is higher in NEMO due to its higher 
initial path set up time as discussed in section 4.2.1 than in 
MANET. The new route is selected in NEMO after the 
transmission of 950 numbers of packets using MANET which 
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Figure 7: Average packet delay vs. Number of packets 
received in MANEMO 
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It can be observed from Fig.5 that the initial average packet 
delay is higher in basic POSANT routing algorithm due to its 
higher initial path set up time as discussed in section 4.1.1 than 
the other two routing algorithms. The new route is selected in 


due to its higher initial path set up time than the other two 
routing algorithms. 

8.0 CONCLUSION 

The proposed work integrates MANET and NEMO technology 


integrated schemes have already been proposed so far but most 
of the researchers define the architecture and the purpose of 
integration. They did not present any simulation results. But the 
proposed integrated scheme has been simulated to observe the 


initial path set up time and average packet delay. The 
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performances of the proposed routing algorithms are evaluated 
considering only the data class of traffic. It can be extended by 
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ABSTRACT 

Mobile devices are used for conveying important information. 
Opportunity exist to introduce users to different application of 
resource constraint mobile device Currently, the client is 
forced to continuously poll for updates from potentially 
different data sources, such as, e-commerce, on-line auctions, 
stock and weather sites, to stay up to date with potential 
changes in content. We employ a pair of proxies, located on the 


wireless data transfers to and from the mobile device. The 
client specifies his interest in changes to specific parts of pages 
by highlighting portions of already loaded web pages in her 
browser. The edge proxy polls the web servers involved, and if 
relevant change have occurred, it aggregates the updates as 
one batch to be sent to the client. The proxy running on the 
mobile device can pull these updates from the edge proxy, 
either on-demand or periodically, or can listen for pushed 
updates initiated by the edge proxy. We also use SMS messages 
to indicate available updates and to inform the user of which 
pages have changed. 


KEYWORDS 
Mobile wireless communication, proxy Process, caching, pre 
fetching, energy measurement. 


1.0 INTRODUCTION 

In these we introduce an automated and efficient approach for 
browsing HTML pages with dynamically changing content on 
mobile devices. Following the fluctuations of the favorite 
currency, stock value, or auction currently requires the user to 
reload all the pages in order to capture any changes to the data. 
The costs of these data transfers to the user come in many 
forms, including slow data access, excessive battery 
consumption on the device and inconvenience due to the user's 
active involvement in constant data reload to be seamlessly 
updated only when content of interest to the user changes. Our 
approach greatly reduces the costs of updates by: i) allowing 
the users to mark the parts each page that are of interest to 
them, ii) off loading the task of Determining when those parts 
have changed to a resource-rich proxy and iii) leveraging the 
proxy for batching those updates and sending them to the user's 
device periodically. We expect that our system will be useful in 
two kinds of browsing situations: Our first target is providing 


seamless low-cost content updates during active client web 
browsing. Imagine a user browsing dynamic content om her 
PDA during her daily commute or at an airport terminal 
waiting for her flight. We leverage our resource-rich proxy to 
save data Transfers for both the case where the user wants to 
keep up to date with rapidly changing content for ber favorite 
pages as well as for the case of browsing to random pages. 

Our target scenario is automatic periodic content refresh for the 
user's favorite content, for subsequent browsing while 
disconnected. This scenario corresponds to a user carrying a 
handheld device in her pocket, and having her preferred content 
(news, weather, stocks, etc) automatically updated. We 
deployed an actual proxy in our lab from which our mobile 
device can connect using two alternative wireless networking 
capabilities: 802.11 and cellular communication over GPRS. 
Each of these networking capabilities offer different trade-offs 
in terms of data download costs. Specifically, access to content 
over cellular networks is ubiquitous and low power, but is 
relatively slow. On the other hand, transfers over Wi-Fi 
(802.11) are fast, but have high energy costs. Indeed, an 802.11 
card can reduce the battery lifetime of a PDA by up to a factor 
of six when in continuously active mode and by a factor of 
nearly two when in power saving mode. We measure the data 
transfer and energy savings for several dynamic content refresh 
schemes. Specifically, we implement and compare a poll-based 
scheme, where the mobile proxy periodically polls the edge 
proxy for updates, and a push-based approach, where the edge 
proxy pushes updates to the device based оп a schedule.[1][2]. 


2.0 OUR FRAMEWORK 


The client interaction with many of today's web servers is ` 


repetitive in nature, such as, constantly polling an EBay auction 
to check the status of a bid, or refreshing a page that contains 
stock quotes to track the changing values of a stock. While 
browser caches support "get if modified since" mechanisms, 
this typically fails to save any data transfers due to frequent 
updates to parts of the page that are Jargely irrelevant to the 
user. These changes include ad banners or the time of day, and 
although the user may not be interested in them, they usually 
result in the page being reloaded almost every time 


2.1 CLIENT INTERFACE 
The user specifies her interest in changes to specific parts of 
each page by highlighting portions of the web page on her 
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device screen, as illustrated in Figure 1. The end points of a 
highlighted region serve as the start and end points of an 
annotation that the captures. 

To keep track of the mobile client's interest in specific page 
regions even while the content changes, we use a well 
documented tree technique for maintaining robust HTML 
Document locations . This technique has been shown to 
robustly keep track of a location within a web document, in the 
face of typical value changes to dynamic content and even in 
the case of structural changes to the document, such as 
paragraph reordering or deletion. 


2.2 ARCHITECTURE COMPONENTS 

There are two main components of our system: The mobile 
device proxy and the edge server proxy. The mobile proxy 
resides on the mobile device. It consists of a proxy that 
intercepts client web requests, a cache for storing the responses 
to previous requests, and a hardware manager whicb controls 
the state of the wireless connections available on the device. 
The mobile proxy's main job is to communicate with the edge 
server proxy and process any cache updates. The hardware 
manager on the mobile device is responsible for determining 
which wireless interface the inter proxy communication should 
use. The hardware manager makes its decision based on user 
defined preferences. The user can choose to prefer GPRS-only, 
WiFi-only or an adaptive GPRS/WiFi hybrid with the goal of 
optimizing energy consumption automatically. In the hybrid 
case, the hardware manager bases its decision on which 
interface to use on the size of the data to be transferred. A long 
download of a large update on GPRS may consume more 
energy overall than the equivalent transfer over 802.1 1,even if 
the GPRS connection uses relatively less power. 


22.1 MOBILE DEVICE COMPONENTS 
1) Mobile device web browser 

2) Mobile proxy 

3) Cache 

4) Hardware Manager 


2.2.2 EDGE SERVER COMPONENTS 


The edge server proxy is placed on any well connected 
computer. The edge server proxy consists of four components: 
proxy process, cache manager, cache, and update manager. The 
proxy process is an event driven server which interacts with 
multiple clients and serves their requests either from the cache 
or by directly connecting to the web servers in question. The 
cache manager consists of an interface to the cache and a 
thread pool. The cache manager's responsibility is to keep the 
cache up to date. Each thread periodically polls the web servers 
that a particular cache entry references, checking for any 
changes. The cache stores the interest profiles for all the mobile 
devices that registered their interest with the edge proxy. When 
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a cached page is changed, the update manager adds a reference 
to the changed content to the update batch of each mobile 
device that has registered interest in that particular page [2] [7]. 


2.3 OPERATION 

When a mobile device first joins the system, it registers with 
the edge server proxy. The edge server proxy assigns each 
device a unique id so that it can subsequently differentiate 
between devices in the system. Differentiating based on IP 
address is not a sufficient means, since a mobile device may 
change IP addresses several times each day. When a request is 


response (local cache hit), the response is retumed immediately 
to the web browser and no wireless communication occurs. If 
the response is not found in the cache (local cache miss), the 
mobile proxy forwards the request to the edge server proxy. 
The edge server proxy, in tum, checks its cache for the 
response and returns it from its cache if it is there. Otherwise 
the request is forwarded to the actual web server. If the 
response is a HTML page, the edge server proxy pre fetches all 
the embedded objects within that page and batches them with 
the response to be delivered to the mobile client in one 
transmission. Any pending cache updates are also included in 
the batch transfer. Upon receiving the response from the edge 
server, the mobile proxy caches the response and updates its 
cache with any other additional files included in the transfer. 
The response is then petumed to the web browser. The client 
proxy acknowledgee the receipt of any updates, such that the 
edge server proxy can remove those updates from the update 
manager's list for that device. In our system, the mobile proxy 
learns that cache updates are available through three alternative 


In the polling based scheme, the mobile proxy periodically 
polls the edge server proxy asking whether any updates are 
available. This periodic content refresh occurs automatically 
during active browsing seasions in order to keep the local client 
cache up to date, and in turn to minimize client perceived 
staleness and waiting time. Alternatively, for the push based 
approach, the mobile proxy listens on a particular port for 
incoming updates initiated by the edge server proxy. In this 
situation, the edge server proxy requires a valid IP address for 
the client. [4] 


3.1 PROXY-BASED CONFIGURATIONS USED. FOR 
COMPARISON 

tib following section; we Abê in deal the vations 

Proxy-based and standalone configurations we use for 

some of the features of our main proxy approach, we are able 

to demonstrate what aspects contribute to the overall wireless 

communication savings. 
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3.1.1 BASELINE CONFIGURATION WITHOUT PROXY 
In our baseline configuration, the browser running on the 
mobile device polls all web sites periodically for ће pages 
Opened by the client for any change in the content. No proxies 
are used in this configuration. However, the browser’s cache is 
fully functional. 


3.1.2 SIMPLE PROXY 

In this configuration, we run the two proxies, the mobile device 
proxy and the edge proxy, and wê use the edge proxy to poll 
for any changes to the data occurring at cach separate data 
source. The proxy schedules an update to be sent to the client 
when there is any change to a web page. The edge proxy 
aggregates all updates to be sent to. the user as described in our 
main algorithm. The mobile proxy pulls updates both upon a 
cache miss and periodically with the same interval as that of 
polling in the baseline configuration. [7] 


3.1.3 INTELLIGENT PROXY 

The intelligent proxy configuration is our proxy-based 
approach which filters out any updates to the mobile device if 
the parts of the page that the client is interested in have not 
changed. The client specifies interest by highlighting page 
regions through tbe interface. 4 


33.4 THRESHOLDS PROXY 

The thresholds proxy is an enhanced intelligent proxy where 
the client specifies her regions of interest within a web page, 
but can also specify a threshold of significant change for each 
numerical value. All updates for numerical value changes that 
are below the significant change threshold are filtered out by 
the edge proxy. We use both a polling based and push-based 
thresholds proxy in our experiments. One drawback of our 
experimental setup is that our edge server is operating outside 
the Rogers GPRS network. As a result, our edge server is 
unable to create a connection to the device over GPRS as all 
incoming communication from an external source is blocked by 
the Rogers firewall. In order to facilitate push-based 
experiments over GPRS, our mobile proxy creates a persistent 
TCP connection with the edge server. Updates are then pushed 
to the mobile device over this connection. [2][5][7] 


3.2 PARAMETERS USED IN EACH CONFIGURATION 
We use Internet Explorer (IE) as our web browser on the 
mobile device. However, we use a simple wrapper around it to 
mimic the user and drive the experiments. All communication 
uses HTTP/1.1. In our baseline configuration, IE is running 
alone on the mobile device. The web browser contains a cache 
of its own, and as a result, after the first round of 
communication, the majority of the requests consist of if 
modified since requests from the browser for validating the 
cached items. The browser is set up to visit the four sites in the 
trace, once every 4 minutes. This means that over the 3 hour 
experiment, each of the 4 websites in the trace is loaded 45 
times. The period with which the pages are loaded is irrelevant, 
except for allowing the experiment to complete in a reasonable 
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amount of time and to allow for full download of the respective 
pages over Wi-Fi or GPRS.[5][6] 


33 EXPERIMENTAL SETUP FOR 
MEASUREMENTS 

The most commonly used method for automated measurement 
of power dissipation in a mobile device uses a precision 
ammeter. In this traditional method, the device is powered by a 
low-noise constant voltage source. The precision ammeter, 
equipped with a serial communication interface, is placed in 
series with the device’s power delivery path. Energy is, 
computed as a function of the measured current and supply 
voltage. This approach can result in very high accuracy, low 
bandwidth current measurements, but it is not practical for 
today's low-voltage devices which typically operate from a 
single Lithium-Ion cell. During startup, the high in-rush 
current, Нп causes the device's voltage supply, Vm to drop duc 
to the relatively large internal ammeter sensing resistance (5L ) 
and its parasitic inductance. In many cases, this drop causes the 
intemal power management protection circuit included in 
newer devices to suspend startup.Hence, the traditional power 
measurement technique becomes infeasible, as we experienced 
first-hand with our transition from an older device to a more 
modem version. 

Calculated using formula where fs are the oscilloscope 
sampling frequency [2][8] 
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Table 1: mccum comm 


Cache hite/misses are only for mobile proxy and not the web 
browser's cache. Updates arc the number of page changes that 
occurred, several updates may be sent in one batched transfer. 
In contrast to the data transmission graph, Ás a result, the 
simple proxy method downloads nearly the entire batch of data 
each period. The intelligent proxy reduces much of the wireless 
data received by reducing the number of updates the edge 
server proxy sends to the mobile client. As, we can sec from 
Table 1, (21[7]by only sending updates when parts of the page 
of interest to the user change, we reduce the total number of 
updates by a factor of 3. This translates into a 65.296 reduction 
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in the amount of data received when compared to the baseline 


proxy less approach. Finally, the thresholds proxy reduces the 
amount of data received over the wireless link. 


3.3.1 ENERGY CONSUMPTION 

The average energy consumed by the device per download 
period (i.e., loading each of the 4 web sites in the browser) for 
the 3 hour, the baseline proxy less configuration using the 
802.11 connection and using the GPRS connection, our poll- 
based thresholds proxy configuration using the 802.11 
thresholds proxy configuration using the 802.11 connection and 
using the GPRS connection. 














Figure 1: Average energy expenditure per download period 


We can see that all configurations of the thresholds proxy are 
superior in energy conservation compared to their proxy less 
counterparts. Our proxy system reduces energy costs by factors 
of 2.1 and 4.5 when used over the 802.11 and GPRS 
connection, respectively. 

3.320 ENERGY CONSUMPTION IN PUSH VERSUS 

POLL PROXY . 

As stated in Fig 1, [2] the differences in energy consumption 
between the push-based and poll-based proxies are small for 
both Wi-Fi and GPRS. The push-based proxy using the GPRS 
connection conserves 7% energy per download period 
compared to the poll-based proxy. Maintaining a persistent 
connection with the edge server in the push based configuration 
is more energy-efficient than requiring the device to create a 
connection, request an update, and tear down the connection 
during each period in this case. The push based proxy using the 
Wi-Fi connection on the other hand, uses 4% more energy per 
download period than its polling based counterpart. 
Constantly listening for incoming communication over the Wi- 
Fi connection requires more energy than periodically sending 
update request packets. This push based proxy could 
potentially save more than the polling approach, if the device 
used a small listening window for receiving updates [9][10]. 
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40 ENERGY CONSUMPTION FOR OFF-LINE 

UPDATES USING THE HYBRID APPROACH 
In this section, we analyze the energy consumption of the SMS 
besed proxy system. The mobile proxy requests an update only 
when it receives an SMS message specifically informing it that 
there sre updates available. The proxy uses the control 
information contained in this SMS message to determine the 
best interface to use for downloading the update. The proxy 
uses GPRS to download any updates under 30 KB and Wi-Fi 
for updates over this threshold. 








Figure 2: Average energy expenditure per download period 
The average energy consumption of this proxy is stated in Fig 
2, [2] along with the best results for the GPRS and Wi-Fi only 
proxies. The hybrid SMS proxy saves an additional 14% energy 
over the push based GPRS proxy and 10% over the polling Wi- 
Fi proxy. The savings are the result of not having to send 
periodic update requests, or conversely, listening over the 
are available.[7] 


41 ENERGY CONSUMPTION 
. ACCESSES 

To determine the energy consumption for the case of visiting 
now pages i.e., cold cache miss, we ran an experiment where 
we viewed one of the pages in our trace with an empty browser 
cache and an empty proxy cache. The age used in this 
experiment contained 51 embedded files, consisting of dozens 
of small images, a couple of style sheets, and several JavaScript 
files. We used a proxy less setup as baseline for comparison. 
The total energy required to view the selected page in each 
configuration is Compared to the proxy less approach, the 
energy expenditure is reduced by 6996 when using the GPRS 
connection in conjunction with our proxy system, gives 1596 
when using the Wi-Fi connection see fig 3,[2] and 12.8 for 
proxy system see fig 3[2]. 


FOR NEW PAGE 


Dynamic Data Updates for Mobile Devices by Using 802.11 Wireless Commumications 





Figure 3: Total energy cost for downloading a page 


5.0 CONCLUSIONS 

In these we introduced an automated approach to automatic 
data refresh for mobile devices. Our approach is centered 
around a general purpose mechanism for letting the user specify 
her interest in changes to specific parts of pages. We avoid 
introducing new languages or complex interfaces that may 
prevent wide acceptance. Instead, the user loads her favorite 
pages on her mobile device browser and highlights areas of 
interest in those pages using the regular browser’s cursor. We 
offload the detection of updates to content that matches the 
user's interest, ошо a  fully-connected edge proxy. 
Subsequently, either while the client is actively browsing or 


while attending to everyday activities of travel, shopping, work ` 


and play, the mobile device performs automatic data refresh 
transparently to the user. 

Our approach is fully implemented using both Wi-Fi and GPRS 
communication on an actual mobile device and evaluated on 
real world data traces. Our results show that our general 
purpose proxy system saves data transfers to and from the 
mobile device by an order of magnitude and battery 
consumption by up to a factor of 4.5. These savings are due to 
the fact that, typically, there are frequent changes to parts of 
dynamic content web pages that the user is not interested in, 
such as the time of day or an ad banner. In addition, many 
changes in the n-th decimal of numerical values can be 
typically ignored. We have shown that, a Push-based approach 
provides minimal gains over ‘a poll-based approach. 
Additionally, we have shown that by using the existing SMS 
infrastructure to deliver notifications on dynamic content 
changes, we can offer an energy efficient and user friendly way 
to keep the clients up to date with their content of interest. 
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ABSTRACT 

In this work the encryptlon mechanism in MANET & WSN is 
considered. One of the very important parameters with MANET 
& WSN is its low computing power availability in its real time 


power. This study is based on a Mathematical model [11] 
being used for encryption process, which consumes less power 
when compared to standard algorithms like 3 DES & RSA. This 
model generates a distributed sequence which is used as sub 
key. To generate sub key, variable time stamps are used. These 
time stamps are going to be generated by the same model. The 
model is also studied for handling errors in data transmission 
particularly in MANET/WSN environments. 


KEYWORDS f 

Cubic spline interpolation, Encryption Decryption Mechanism, 
Gaussian probability density function, Key & Sub key, 
Random number generators, Time stamp and Nonce, 
Tridiogonal matrix algorithm. 

1.0 INTRODUCTION 

Historically, encryption schemes were the first central area of 
interest in cryptography[2-12]. They deal with providing means 
to enable private communication over an insecure channel A 
"gender wishes to transmit information to a receiver over ап 
insecure channel that is a channel which may be tapped by an 


adversary. 

Thus, the information to be communicated, which we call the 
plaintext, must be transformed (encrypted)to a cipher text, a 
form not legible by anybody other than the intended receiver. 
The latter must be given some way to decrypt the cipher text, 
ie. retrieve the original message, while this must not be 
possible for an adversary. This is where keys come into play; 


transforms plaintexts into cipher texts while the decryption 
algorithm converts cipher texts back into plaintexts. A third 
algorithm, called the key generator, creates pairs of keys: an 
encryption key, input to the encryption algorithm, and a related 


decryption key needed to decrypt. This work mainly deals with 
the 
sufficient strength to the encryption mechanism. 

Partial differential equations to model multiscale phenomena 
are ubiquitous in industrial applications and their numerical 
solution is an outstanding challenge within the field of 
scientific computing[11]. The approach is to process the 
mathematical model at the level of the equations, before 
discretization, either removing non-essential small scales when 
possible, or exploiting special features of the small scales ‘such 
as self-similarity ar scale separation to formulate more tractable 
computational problems. 


2.0 LITERATURE SURVEY 
Currently a lot of work is going on performance of Manets 
(Mobile Adhoc Networks) and WSN( Wireless sensor 
Networks) [1], where the study depends on TCP performance, 
ш ш The underlying study with these things is 
power consumption of the mechanisms and security 
кош i ae 
justify the use of TCP variants for loss of packets due to 
random noise introduced in MANETs and WSNs. Another 
important perameter in MANET s & WSN s is its need for low 
power consumption of mechanisms. In their work[5], the 
authors proposed a mechanism which requires least power 
expended for each node to transmit just enough power to 
ensure reliable communication. Security to data transmitted is 
one more important parameter to be considered in MANETs 
and WSNs. In the work [15], the authors proposed a security 
mechanism where canned security solutions like IP Security 


may not work. In the work[11], the authors presented a 


which generates sub keys which provides . 


mathematical model for generation of sub keys, which can be. 


used for encryption & decryption purpose which provides 
security. The advantage with this model is it consumes less 
power when compared to conventional algorithms which 
makes it more suitable in MANETs and WSNs. The one more 
important issue to be considered іп MANETs and WSNs, is the 
effect of noise on data transfer. In their work[22], the authors 
presented two analytical models to describe the noise levels in 
real network applications. In this work an attempt has been 
made to identify the effects of noise on security models[20], 
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and means to overcome them by generating a random number 
generator based on Gaussian distribution. 


3.0 NUMERICAL DATA ANALYSIS . 
The fallowing are the steps to generate a numerical method for 
data analysis[21]. 


3.1 DISCRITIZATION METHODS 

The numerical solution of data flow and other related process 
can begin when the laws governing these processes have been 
equations that we shall encounter express a certain 
conservation principle. Esch equation employs a certain 
quantity as its dependent variable and implies that there must 
be a balance among various factors that influence the variable. 
The numerical solution of a differential equation consists of a 
set of numbers from which the distribution of the dependent 
variable can be constructed. In this sense a numerical method is 
akin to a laboratory i in which a set of experimental 
readings enable us to establish the distribution of the measured 
quantity in the domain under investigation 

Let us suppose that we decide to represent the variation of Ø 
by a polynomial in x 
@зал+арх+а,х°*+...................... AgX 
and employ a numerical method to find the finite number of 
coefficients al, а2.....:....ап. This will enable us to evaluate 
O, at any location x by substituting the value of x and the 
values of a’s in the above equation. 


n 


3.2 STEADY ONE DIMENSIONAL DATA FLOW 
Steady state one-dimensional equation is given by (G/ox\k. 
JI/Odx) +s =0. 0 where k & s are constants. To derive the 
discretisation equation we shall employ the grid point cluster. 
We focus attention on grid point P, which has grid points E, W 
as neighbors. For one dimensional problem under consideration 
we shall assume a unit-thickness in y and z directions. Thus the 
volume of control volume is delx*1*1. 
шко ws tie grate Uns abqveequatinn- over “the control 
volume, we gèt 
` (K. ӘТ/.9Х),-– (К.. OT/AX), + .§S. 0X =0.0 (eq. 1) 
If we evaluate the derivatives ӘТ/ dX in the above equation 
from piece wise linear profile , the resulting equation will be 
Ке Te - Tp/( 0X)e - Kw(Tp — Тау dX)w + d 
S *del x-0.0 (eq. 2) 
where S is average value of s over control volume. 
This leads to discretization equation 
pT, = &, T, a T, +b 

a= KOX, 
ay = К.Х, 
` ар= а.+ам-8р.беГХ 
b=s,.delX . 


3.3 SOLUTION OF LINEAR ALGEBRAIC EQUATIONS 
: The solution of the discretisation equations for tbe one- 
dimensional situation can be obtained 


(eq. 3) 
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by the standard Gaussian elimination method. Because of the 
particularly simple form of equations, the elimination process 
leads to a delightfully convenient algorithm. 

For convenience in presenting the algorithm, it is necessary to 
use somewhat different 
nomenclature. Suppose the grid points are numbered 1,2,3...ni 
where 1 and ni denoting boundary points. 

The discretisation equation can be written as 

A, T, + ВТ +GT „ = D, , 

For I = 1,23............ .ni. Thus the data value T is related to 
neighboring data values T ,„ and T ,,. For the given problem 
C,=0 and B,=0; 

Referring to the tridiogonal matrix of coefficients above, the 
n aa ia DL ШО 
ne Ai — (Ci-1 /Ai)* Bi where і = 2,3......... ni. (eq 4) 


Di- Di - (Ci-1 /Ai) * Di 
Then computing the unknowns from back substitution 


Tn = Dn/ An. (eq 6) 
Then Tn = Dk - Ak * Tk+1 / Ak, ke ni-1, ni-2...3,2,1. 

(eq 7) 
4.0 MATHEMATICAL MODEL 


The approach to time series analysis was the establishment of & 
mathematical model describing the observed system. 
Depending on the appropriation of the problem a linear or 
nonlinear model will be developed. This model can be useful to 
generate data at different times to map it with plain text to 
generate cipher text. 


4.1 LINEAR DATA FLOW PROBLEM 

The initialization vector (IV) considered in the problem is 
When t=0, T (1) =Y (1) 2300. where [=1,2,....... M. 

Dividing the problem area into M number of points, and for 
simplicity by assuming data of the first and Mth grid points are 


considered to be known and constant. 

For the grid points 2, M-1, the coefficients can be represented 
by considering the conservation equation, 

Әх (T rı - T, ) + ol Ox (T? Tí, = 

(Әх) Gt CT, -Ti*) (eq.8) 


where T; represents data value for the considered grid point for 
the preceding delt, Ty,’ & T," represents data values 
for the préceding and succeeding grid points for the current 
delt. 

Considering œ which is а key for the given model, the 
coefficients are obtained for each state (grid point) in terms of 
AÇI) refers to data value of the corresponding grid point, CM) 
and В(1) refers to data values of preceding and succeeding grid 
points for the current delt, D(I) refers to data value of the 
considered grid point in the preceding delt. 


AME 1 + 2 a delt/(delx)**2. (eq. 9) 
BAE -a delt/(delx)**2. (eq. 10) 
C(D- - a delt/(delx)**2. (eq. 11) 
DO=T;" 

(eq. 12) 


25 


Study of the Effects of Noise & Future Time Stamps on a New Model Based Encryption Mechanism 


42 PROCEDURE FOR GENERATING DATA FROM 
COEFFICIENTS BY TRIDIOGONAL METHOD 

Using the coefficients of grid points, and by using the 
tridiogonal matrix algorithm, the data distribution is calculated. 
The grid points are numbered 1,2,3,............ М. with points 1 
and M denoting extreme states. 

The discretisation equation can be written as 

Ai Ti + BiT#+1 +CiTi-1 = Di 

For I = 1,2,3...M. Thus the data Ti is related to neighboring 
data values Ti+1 and Ті-1. For the given problem C1-0 and 
BM=0 as T] & TM represent boundary states. 

These conditions imply that Т1 is known in terms of T2. The 
equation for I=2, is a relation between ТІ, T2 & ТЗ. But since 
T1 can be expressed in terms of T2 , this relation reduces to a 
relation between T2 and T3. This process of substitution can be 
continued until TM-1 can be formally expressed as TM. But 
since TM is known we can obtain TM-1.This enables us to 
begin back substitution process in which TM-2, TM-3.. .T3, T2 
can be obtained. This process is continued until further 
iterations cease to produce any significant change in the values 
of T's. Finally the data distribution is obtained for all grid 
points for different times by considering a suitable œ which is 
used as key. 


5.0 IDENTIFYING FUTURE TIME STAMPS FOR DATA 
GENERATION 

In the given mathematical model[11], the data is calculated for 
different time stamps, which are fixed in nature. If variable 
time stamps are used, then the cryptoanalysis of the algorithm 
is more complex which increases the strength of the algorithm. 
To calculate variable time stamps the same model[11] can be 
used. The key remains the same with initial delt, delx also 
remains the same. Initial time stamp is considered.. Any 
random time stamp also being considered. By using the 
coefficients generated by the model, by using initial and 
These time stamps are considered in the model to generate sub 
key values. 


6.0 EFFECT OF TRANSMISSION ERRORS ON DATA 
TRANSFER 

The encrypted form of data during the transmission process 
will be subjected to exrors due to some noise sources. These 
errors can affect the integrity of message or data transfer. The 
effects of these errors are checked in the present study by 
modeling the error as a random number having Gaussian 
Probability Density Function. The random number generator 
modeled is used to create values of the possible data errors. 
These*errors are stored in a sub database which can be made 
use of when corrupted sub key is received at the receiver s side. 
Thus when the received message after decryption is showing 
any ambiguity in its meaning or any integrity variations 
because of noise, it can be checked using the sub data base 
developed by the random number generator model. 

By considering a suitable key a =4, del t= 2, delx =2. 

Initial time stamp =2, Intermediate time stamp=6; 
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Interpolated and Future time stamps generated by using the 
model are 


3, 4.8.5.99, 7.2, 8.6, 10.2, 12.8. 


Different data values obtained using these time are 
3067533811123222292880 18 10 17 11 1 19; 

7.0 RESULTS 

By considering a suitable key a =4, del t= 2, дех =2 for a 
total time stanrp of 6 units, 

Different data values obtained are 


For del t=2, time =2; > 
306753381112322229288018 10 17 11 1 19; 

For delt =2, time=4; 
332262712101029126133325418811 

For delt =2, time=6; 

97255309 180231 17 1566140831 22; 

Thus by using the same key, by changing the time stamp values 
different sequences can be generated which are used as sub 
keys. Those sub keys can be mapped to plain text to generate 
cipher text [13,15]. 


8.0 SECURITY ANALYSIS 
Analysis by Construction: In the given model, a single valued 


stamps, the model generates different sub key values which 
provide sufficient security against crypto analysis. Since the 
model involves not only key, but also interpolated & future 
time stamps, it is relatively free from cipher text attack, known 
plain text & cipher text attacks. The given model is studied for 
its improved performance against noise with out compromising 
the security of the mechanism 


9.0 a CLUSION & FUTURE WORK 

This encryption mechanism uses a Initialization Vector, 
Interpolated & future Time Stamps & Key to generate 
distributed sequences which are used as sub-keys. The model is 
studied for its improved strength against noise which is a 
unavoidable feature with MANET & WSN ‘s. The model using 
a non linear key can also be studied for its strength against 
noise. 
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ABSTRACT 

Simulation and comparison of the routing protocols for 
network topology hold a significant position in the performance 
evaluation of wireless networks. This paper, discusses 
performance evaluation of Ad-hoc on demand Distance Vector 
(AODV) and Dynamic Source Routing (DSR), routing 
protocols for static WSN using NS-2. Energy efficiency, 
latency, throughput and fairness characteristics in different 
conditions are investigated under different load conditions on 
two-hop and multi-hop network. The comparison results reveal 
that AODV performs better in the network with strict 
requirement on time, whereas DSR is more adaptable in the 
networks with high throughputs and energy constraints. 


KEYWORDS 

Wireless Sensor Network (WSN), Dynamic Source Routing 
(DSR), Ad-Hoc On-demand Distance Vector (AODV), energy 
efficiency, latency, throughput, fairness, NS-2 (network 
simulator-2) 


1.0 INTRODUCTION 

Wireless sensor networking is an emerging technology that has 
a wide range of potential applications including environment 
monitoring, smart spaces, medical systems and robotic 
exploration [6]. 

Such networks will consist of large numbers of distributed 
nodes that organize themselves into a multi-hop wireless 
network. Each node has one or more sensors, embedded 
processors and low-power radios, and is normally battery 
operated. Typically, these nodes coordinate to perform a 
common task. Due to the energy constraints wireless sensor 
networks have to take energy consumption factor in to 
consideration while performing various tasks [6]. Hence these 
are Energy-Aware Wireless Sensor Networks. 

While many aspects of WSN have already been investigated, 
this paper concentrates on the characteristics of 
the routing protocols, in particular on the AODV and DSR 
protocols. 

AODV is a distance vector type routing [3]. It does not require 
nodes to maintain routes to destinations that are not actively 
used. The protocol uses different messages to discover and 
maintain links: Route Requests (RREQs), Route Replies 


1,2 


(RREPs), and Route Errors (RERRs). These message types аге 
received via UDP, and normal IP header processing applies. 
DSR protocol works “ON Demand”, i.e. without any periodic 
updates. Packets carry along the complete path they should 
take. This reduces overhead for large routing updates at the 
network. The nodes store in their cache all known routes. The 
protocol is composed of route discovery and route maintenance 
[3]. 

Both the protocols are implemented in the network layer and 
the MAC layer protocol used is 802.11. The IEEE 802.11 
Standard is by far the most widely deployed wireless LAN 
protocol. This standard specifies the physical, MAC and link 
layer operation. Multiple physical layer encoding schemes are 
defined, each with a different data rate. At the MAC layer 
IEEE 802.11 uses both carrier sensing and virtual carrier 
sensing prior to sending data to avoid collisions. 

The scope of the paper is to simulate the AODV and DSR 
protocols and analyse their performance based on specific 
traffic load conditions and scenarios of wireless sensor network 


. and reveal the fundamental tradeoffs of energy, latency, 


throughput and faimess under steady state simulations by using 
Network Simulator — 2 (NS-2). 

The remainder of the paper is organized as follows: Section 2 
and Section 3 recalls the main features of AODV and DSR. 
Section 4 describes the simulation in multiple environments 
and result of energy consumption, latency, throughput and 
fairness. 


2.0 THE DSR PROTOCOL 

The DSR protocol is composed of two mechanisms that work 
together to allow tbe discovery and maintenance of source 
routes in the ad hoc network: Route Discovery is the 
mechanism by which a source node (S) sending a packet to a 
destination node (D) obtains a route to D [3]. It is used only 
when the route to D is not known. Route Maintenance is the 
mechanism by which node S is able to detect, while using a 
source route to D, if the network topology has changed such 
that it can no longer use its route to D. Route Discovery and 
Route Maintenance each operate entirely on demand. 

When source node S originates a new packet destined to some 
other node D, it will obtain a suitable source route by searching 
its Route Cache of routes previously learned, but if no route is 
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found in its cache, it will initiate the Route Discovery process 
to dynamically find a new route. S transmits a ROUTE 
REQUEST message as a single local broadcast packet, which is 
received by all nodes currently within its range. Each ROUTE 
REQUEST message identifies the initiator and target of the 
Route Discovery, and also contains a unique request id, 
determined by the initiator of the REQUEST. Each ROUTE 
REQUEST also contains a record listing the address of each 
. intermediate node: through which this particular copy of the 
ROUTE REQUEST message has been forwarded. This route 
record is initialized to an empty list. When a node receives a 
ROUTE REQUEST, it will add it’s ID to the discovered route 
field and forward the request or if it is the target of the Route 
Discovery, it returns a ROUTE REPLY message to the source, 
containing the entire route; when the nodes in the discovered 
route receive this ROUTE REPLY, they cache this route in 
their Route Cache for use in sending subsequent packets to this 
destination. Thus the entire route is stored in the cache of all 
the intermediate nodes in that route along with the source node 
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Figurel: DSR Route Discovery Mechanism 


3.0 THE AODV PROTOCOL 

The AODV routing protocol is designed for use in ad-hoc 
mobile networks. AODV is a reactive protocol: the routes are 
created and maintained on demand ic. only when they are 
needed. It uses traditional routing tables, one entry per 
destination, and sequence numbers tó- determine whether 
routing information is up-to-date and to prevent routing loops. 
The distance-vector routing algorithm is used in AODV that 
keeps the information only about next bops to adjacent 
neighbors. An important feature of AODV is the maintenance 
of time-based states in cach node: a routing entry not recently 
used is expired. In case of a route is broken the neighbors can 
be notified. 

Hello messages may be sent to detect and monitor links to 
neighbors. Because nodes periodically send Hello messages, if 
a node fails to receive several Hello messages from a neighbor, 
a link break is detected [4]. When a source has data to transmit 
to an unknown destination, it broadcasts a RREQ to that 
destination. The number of RREQ messages that a node can 
send per second is limited. At each intermediate node, when a 
RREQ is received a route to the source is created. If the 
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receiving node has not received this RREQ before, is no the 
destination and does not have a current route to the destination, 
it rebroadcasts the RREQ [4]. i 

Tf the receiving node is the destination or has a current route to 
the destination, it generates a RREP. The RREP is unicast in a 
hop-by hop fashion to the source. As the RREP propegates, 
„each intermediate node creates a route to the destination. When 
“the source receives the RREP, it records the route to the 
destination and can begin sending data. If multiple RREPs are 
received by the source, the route with the shortest hop count is 
chosen. Unlike DSR the route table entry in the intermediate 
nodes on the established path contain only the record of next 
bop along the route instead of complete route [3]. 
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Figure 3: AODV Route Discovery Mechanism. 


40 RESULTS AND ANALYSIS 

The goal of the experimentation is to reveal the fundamental 
tradeoffs of energy, latency, throughput and faimess in AODV 
and DSR. All simulations are done using NS-2.27. The radio 
power values used to compute energy consumption in idle, 
with the RFM TR3000 radio transceiver [7] on Mica Motes. 
Sinmlati 4 





Table 1: Simulation 
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Table 2: Node Configuration Parameters 
4.1. TWO-HOP SCENARIO 


e ? 


3 


9 10 
Figure 4: Two-hop Scenario of 11 nodes 


The two-hop topology is useful to measure the performance of 
protocol when hidden terminals are present [8]. As shown in 
Fig 4, source and sink pairs are arranged around a single 
intermediate node ic. node 1. The two-hop topology is of 
2500m*500m area. 
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Figure 5: Measurement of Faimesa in 2-hop Sub 


The Fig 5 shows the measurement of fairness in two hop 
scenario. The measurement is done by varying the number of 
nodes. The faimess index values for both the protocols coincide 
exactly over the entire range. The fairness index reduces 
significantly with the increase in number of nodes. With 
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increase on network congestion the channel sharing between 
the nodes becomes unequal, resulting into a drop in fairness 
index of the network. It is observed that both the protocols 
respond identically to increasing congestion in the network 
which gives 
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Figure 7: Measurement of Latency in 2-hop topology 
DSR shows higher latency for alt values of inter-arrival-time 
with the respective latency values showing an overall decrease. 
AODV exhibits a drop at the second value and thereafter 
remains fairly constant. This may be because in AODV the 
node replies to the first arrived RREQ packet and discards all 
those received later thus automatically favoring the least 
congested path whereas in DSR the node accepts all the RREQ . 
packets and then chooses the shortest path which is 
comparatively more time consuming [3]. Also DSR requires 
more time for obtaining routing information, as each node 
consumes more time for processing any control data it receives, 
even if it is not the intended receiver. 
Throughput for both AODV and DSR reduces with increase in 
inter arrival time as seen in Fig 7. DSR gives better throughput 
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than AODV over the range. The decrease in throughput is rapid This is probably due to the fact that DSR applies the principles 


initially and then becomes gradual As the inter arrival time — of promiscuous listening and caching aggressively which 
increases the time for which the network remains idle increases reduces the routing load, thus obtaining higher throughput [5]. 


thus throughput drops in both the cases. 
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Figure 8: Comparison of Average Latency for 2hop Topology 
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Figure 9: Measurement of Throughput in 2-hop topology. 
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4.2. MULTI-HOP CHAIN SCENARIO 

Tbe multi-hop scenarios allow the simnlations of the complex 
interactions that more closely approximate the nature of real 
world WSNs [8]. The multi-hop chain topology can view tbe 
system when tbe sensor nodes are placed equidistant for 
example on the railway track. [4] The multi-hop chain topology 
of 11 nodes is as shown in Fig 7. Here, the node 0 is source and 
node 10 is sink node. 
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Figure 12: Measurement of Energy Consumption in multi-hop 
chain topology 


Energy consumption in AODV & DSR varies linearly with 
inter arrival time & is directly proportional to it It is 
consistently higher in AODV over the range of inter arrival 
times and increases at a higher rate than that of DSR. DSR out 
performs AODV in energy consumption. This may, be due to its 
aggressive approach in promiscuous listening and caching. 
Because of this the nodes can save a lot of routing procedure as 
discussed earlier thus saving power [5]. In AODV hello packets 
are flooded regularly throughout the network. This leads to 
higher power consumption. _ 
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Figure 13: Comparison of Average Energy Consumption for 


Figure 10: Comparison of Average Throughput for 2hop  Multi-hop Topology 


Topology 


` Copy Right © BUIT - 2011 Vol. 3 No. 1 ISSN 0973 - 5658 


3l 


Simulation And Proportional Evaluation of AODV and DSR in Different Environment of WSN 











3 4 5 6 7 8 9 10 





INTER-ARIEV AL TIME [SEC] 








Figure 15: Comparison of Average Latency for Multi-hop 
Topology 


In case of Multi-hop topology also DSR gives higher latency 
than AODV. ео MCCC 
while 




















Figure 16: Measurement of Hop-Hop Latency in multi-hop 
chain topology 


‚Сору Right © ВІЛТ - 2011 Vol. 3 No. 1 ISSN 0973 - 5658 





Hop to hop latency is the delay required for every hop. It is 
found to be lesser for the first hop than the rest for whom it is 
constant. This can be because the source node directly sends 
the packet to the next node where as the remaining intermediate 
possedit unii! cad rM pq тозан 
destination before fi 
































Figure 17: Throughput versus Inter-arrival time for AODV and 
DSR in multi hop scenario. 


The throughput in case of multi-hop topology goes on 
decreasing in what appears to be an exponential curve, the 
reasons being same as those mentioned in case of 2hop 











Figure 18 : Comparison of Average Throughput for Multi-hop 
Topology 


5.0 CONCLUSIONS . 

5.1. From the results obtained, DSR proves advantageous with 
respect to energy consumption with 55.12% lesser average 
power consumption. Thus DSR is the better choice in 
networks deployed in remote or inaccessible areas where 
changing the batteries or replacing the nodes is not 
practically or economically feasible. Such applications 
include environment monitoring, animal tracking etc. 

52. DSR has poor latency as compared to AODV in both 
multi-hop and 2-hop scenarios. Hence the delay in packet 
delivery is higher for DSR, thus for time critical 
applications in which on time delivery of data is of utmost 
importance AODV is preferable over DSR. Such 
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. applications include various military, disaster warning, 
health care etc., applications... 

5.3. DSR exhibits ‘neatly 51% higher throughput than AODV. 
Higher throughput is desirable in case of data intensive 
applications like industrial process monitoring, urban 
pollution and traffic monitoring networks etc, which 
-- generate a large amount of:data that must reach the 
банд DSR protocol gives better performance in such 

'— cases. 

s. 4. Both AODV: and: DSR protocols g žive identical results with 

9 respect ito’ fairness Thus in ‘topologies’ with ‘changing node 
Е boa, 
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ABSTRACT 

The present work integrates mobile ad-hoc network, wireless 
local area network and cellular network. It balances the load 
among the three networks in the integrated heterogeneous 
environment. It uses a home agent for the selection of the 
optimal network depending upon the type of session of the 
mobile nodes. It uses route selection algorithm to select the 
optimal route in the selected optimal network. Two different 
position based ant colony routing algorithms are proposed for 
mobile ad-hoc network in the present work. Both the routing 
algorithms select an optimal route for a session before starting 
it. They use route maintenance algorithm to detect whether a 
node associated with an existing route is going out of the 
communication range during the ongoing session before the 
existing route fails completely. Such consideration helps to 
reduce the data packet loss for both the algorithm. Our 
previous route selection algorithm is used for route selection in 
the wireless local area network and cellular network. The 
performance of the proposed routing algorithms for mobile ad- 
hoc network are compared with the performance of the existing 
basic position based ant colony routing algorithm on the basis 
of initial path set up time, average delay and packet loss. The 
performance of the three networks is compared on the basis of 
path set up time and average packet delay in the integrated 
heterogeneous environment. The performance of the proposed 
integrated scheme is evaluated in terms of blocking probability. 


KEYWORDS 
MANET, WLAN, Cellular Network, Basic POSANT routing 
algorithm. 


1.0 INTRODUCTION 

The mobile nodes (MNs) will be equipped with multiple 
wireless access technology in the future wireless networks. So 
the future wireless networks are an integrated heterogeneous 
environment where cach MN has multiple network interfaces 
corresponding to multiple wireless access technology 
associated with it. The seamless mobility management and 
load balancing are the challenging issues for such a 
heterogeneous wireless networks environment. 

Such several integration schemes have been reported so far. 
Nair and Jhu introduced [1] network latency, congestion, 
battery power, service type as important performance criteria to 
evaluate seamless vertical mobility. 


An end-to-end mobility management system is proposed in [2] 
to reduce unnecessary handoff and ping-pong effect by using 
measurement on the condition of different networks. 
Nasser et al. proposed a vertical handoff decision method [3] to 
calculate the service quality for available networks and selects 
the network with the highest quality. The vertical handoff 
algorithms in [1,3] are not adequate to coordinate the QoS of 
many individual mobile users or adapt to newly emerging 
performance requirements for handoff and changing network 
status. The vertical handoff decision function for heterogeneous 
wireless network in [4] is a measurement of network quality. 
But the authors did not provide any performance analysis. An 
active application oriented handoff decision algorithm [5] was 
proposed for multi interface mobile terminals to reduce the 
power consumption caused by unnecessary handoff and other 
unnecessary interface activation. 
The present work considers the integration of mobile ad-hoc 
network (MANET), wireless local area network (WLAN) and 
cellular network. A mobile router (MR) is associated with each 
MN for maintaining the session in such an integrated 
heterogeneous environment Each MN in this integrated 
environment has three network interfaces. The cellular network 
has the excellence of wide coverage, seamless roaming support 
and better quality of service. The WLAN found its application 
as a low cost high speed solution to cover hot spot like Internet 
cafes, office buildings, apartment buildings etc. to solve the 
wideband data access problem and to utilize the existing 
infrastructure which helps to reduce the implementation cost of 
the network. The cellular network and WLAN provide single 
hop communication environment whereas MANET helps to 


„extend this to a multi hop communication environment [6]. The 


MANET provides multi hop communication environment to 
the MNs without using any existing infrastructure. Moreover 
the small, low cost and low powers are suitable for frequent but 
short duration sessions like making a phone call, checking 
appointment schedule etc. So low power consumption is an 
important factor for such application which only can be 
achieved using MANET environment. 

The present work maintains a home agent (HA) to select an 
optimal network depending upon the type of application of the 
MNs. When a MN wants to initiate a session it sends session, 
request message to the HA. This message contains MN 
identification (MN. id) and type of session. The home agent 
selects MANET as optimal network for the MNs in case of 
short duration sessions inside the hot spot cells. It selects 
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cellular network as optimal network for the MNs having a 
session with little data for tranamission or reception and long 
idle period. Though the power consumption of the network 
interface card (NIC) in MN for uploed in cellular network is 
almost two times than that of WLAN network still it is suitable 
because using lesser bandwidth of cellular network MNs can 
transmit or receive only a small amount of data. On the other 
hand the WLAN network is suitable for MNs having a session 
with lot of data due to its high speed, high bandwidth and low 
cost which helps to complete transmission or reception using 
the same network which in turn reduces the frequency of 
vertical handoff. The power consumption of the NIC in MN in 
case of idle mode is almost 9 times higher in WLAN network 
in comparison to cellular network. So it would be more 
advantageous to select the cellular network for mobility 
management [7] as energy efficient interface in case of idle 
MN. The HA maintains session count counter and block. count 
counter. It increases session count counter by 1 after receiving 
a session request message from MN. The home agent increases 
block count counter by 1 if it is unable to select an optimal 
network in response to the session request message of MN 
within time out. The home agent computes the session 
blocking probability of the proposed scheme as the ratio of 
block count and session count. 

Two different routing algorithms for MANET are proposed in 
the present work. These algorithms are discussed in section IL 
The present work uses the route selection algorithm as 
proposed in [8] for cellular network and WLAN. The HA of 
the proposed scheme works as the vertical handoff controller as 
considered in [8] to execute the route selection algorithm. 


2.0 ROUTING ALGORITHMS FOR MANET 
discussion in the following sections. 


2.1 HA POSANT ROUTING ALGORITHM 

The HA is equipped with Google Map [9] and each MN is 
equipped with global position system (GPS). A source MN 
(S_id) sends route request message (RRM as discussed in 
section. 2.1.1) to HA for the initiation of a session with & 
destination MN (D id). The HA triggers route selection 
‘algorithm (as discussed in section 2.1.2) to select an optimal 
route in response to RRM and sends the optimal route to 5 id 
using route found message (RFM as discussed in section 2.1.1). 
The home agent assigns unique session identification (55 id) 
to each session after selecting an optimal route for it. After 


receiving route found message S id generates Type 0 packet . 


(TO as discussed in section 2.1.3). The route field (Route) of 
REM and TO contains the identification of all MNs which are 
associated with the optimal route. The S_id sends TO to D_id 
through all MNs which are identified in Route of ТО. Each MN 
in the MANET environment maintains a routing table (as 
discussed in section 2.1.4) and inserts a record in the routing 
table after receiving a TO. Both 5 id and D id associated with 
a particular session generate Type 1 pecket (T1 as discussed in 
section 2.1.3) and send ТІ to each other to maintain the 
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bidirectional transmission of packets corresponding to a 
particular session among them using the optimal route which is 
mentioned in RFM. The HA maintains a session table (as 
discussed in section 2.1.5) to store the information of all the 
ongoing sessions among MNs in MANET. The home agent 
inserts a. record in the session table after selecting an optimal 
route. As soon as an ongoing session is over S id associated 
with this session sends session over message (SOM as 
discussed in section 2.1.1) to HA. The HA searches the session 
table for the record who's SS. id attribute matches with the 
SS jid field as mentioned in SOM and deletes that record from 
the session table. The HA executes route maintenance 
algorithm (as discussed in section 2.1.6) to detect MN(s) which 
is associated with an existing route(s) and is going out of the 
communication range from its neighboring MN associated with 
the same route during the ongoing session. In such a case the 
HA considers the existing route(s) as faulty and executes route 
selection algofithm for the selection of an alternative optimal 
route(s) to replace the faulty existing route(s). It sends the 
alternative optimal route to S_id(s) associated with the faulty 
existing route(s) using route maintenance message (RMM as 
discussed in section 2.1.1). After receiving route maintenance 
message S id(s) generates Type 2 packet (T2 as discussed in 
section 2.1.3). The new route (N. Route) field of RMM and T2 
contains the identification of all MNs which are associated with 
the alternative optimal route. The S id sends T2 to D id 
through all MNs which are identified in М Route of T2 for 
necessary insertion or modification in their routing table. 


2.1.1 MESSAGE EXCHANGE AMONG VARIOUS MNS 
RRM contains S_id and D_id fields. RFM contains S_id, D_id, 
SS_id and Route fields. The MR associated with S_id uses the 
optimal route as mentioned in Route of RFM for packet 
transmission ing to a particular session which is 
identified by SS id field. SOM contains SS id and F flag 
fields. The F flag field of SOM is set to indicate the end of 
session which is identified by the 55 id field. RMM contains 
SS id, S id and М Route fields. The MR associated with 5 id 
uses the alternative optimal route as mentioned in М Route of 
RFM for packet transmission corresponding to a particular 
session which is identified by SS. id field. 


2.12 ROUTE SELECTION ALGORITHM 

The GPS detects the current location in terms of longitude and 
latitude of each MN. The GPS sends this information of each 
MN to HA as soon as the current location of any MN changes. 
The home agent uses Vincenty's inverse equation [10] to 
calculate the distance between two neighboring mobile nodes 
using their current location which is obtained from GPS. The 
HA maintains a rectangular boundary around the MNs and the 
Google Map in HA shows the real time image of each MN 
provided by GPS. If any intruder MN crosses the rectangular 
boundary from outside HA sends a special security signal to the 
MN(s) closer to the intruder MN. The HA maintains a graph of 
MNs using their real time image which is provided by the 
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Google Map continuoysly.. After..receiving RRM the. НА 
routes from source MN. which,is identified. by. 5 id, to the 
destination MN, which, is identified by D. id in RRM. The HA 
counts the number.of MNs.jn each possible route, and selects 
the.route haying minimum number. of MNs. as- the best route. 
те 
а Өр. ia 


2-13,ТҮРЕ OF PACKETS, M: 


TO contains SS id,.S id, Did, Type and Route fields. The . | 


Type -field indicates -the type.of the packet ав, Туре O..T1 
contains SS id, Node id, S No, Type and PAYLOAD, fields. 
The Node id field of this packet is S id in case the packet is 
generated by the source MN and D id in case the -packet.js 
generated by the, destination MN.. The S_No field indicates the 
sequence number of the packet,and Type field. indicates. the 
Eh DERE ty ae 

to.the |.which is,jdentified by.the 


SS T2 contain SS id id, D id 5 No. Type, N.Route 
and PAYLOAD. idc ы чөн a dia 


ге ауре, sabes 


214 ROUTING.TABLE , |... 
Esch record in the fouling table has $-uitributea a shown in 
TABLE-1. The, attributes SN_NH and DN_NH 

i res 


агн? п 


are, the: source 


EY 





Let TABLE-F is the routing table which is-maintained-by; j^ 
MN and it shows a record for s^ session. The: S: id:and D id 
which are associated, with s%session are identified as-S-and-D 
respectively in: TABLE-1.. T. indicates;the next hop ‘of the j* 
MN in case of transmission from S-to D and E-indicates the 
next hop of j° "ММ, із case of transmission from D to, S in 
TABLE-1. After receiving a TO the j“ MN inserts а record in 
TABLE-!. After receiving aT] the.j^ MN searches-TABLE-1 
for the existing record: whose, SS -id attribute matches with the 
SS. id field as mentioned in T1. It compares the S id attribute 
and the D. id attribute of the existing record: With tlie ‘Node_id 
field as mentioned in T1. If;the Node, id field: in ‘T1 matches 
with the S, id. attribute .of the existing’ record the -j*..MN 
forwards the packet to Т and if ће, Node-id field-in ТЇ EN 
‘with the D id attribute of the existing record the j“ 
‘forwards’.the packet to: E.. After. receiving. T2 ther jÈ. MN 
searches the routing table for the existing record whose SS .id 
attribute matches with.the SS id field as mentioned in-T2.-If 
found it updates the record by. replacing the'old route attribute 
by..the new route attribute, as;mentioned іп T2. Otherwise, it 
inserts а new record іп the routing table. When & MN. is not 
participating in packet transmission ‘corresponding’. to a 
particular session, it deletes the сеш recond from the 
routing table. < - 
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„Each record in the session. table 


2.1.5 SESSION, TABLE - "ET ; 
has. 3 aitributes ав hows in 
‘TABLE-2.. The Route attribute contains the identification of all 
MING, which. are aspociated with the selected, fima route 
„starting. from, S, id ‘to ;D id. for, the . packet, 


‚ corresponding to a.particular,session as. identified by,SS 1 id 


ee E depends a 





. IRNI NAO и: Ке 

2,L6 ROUTE MAINTENANCE ALGORITHM г. 
The HA computes the distance. between. the two- ‘neighboring 
‘MNs,continuously using the information provided by; GPS: and 
_using. the, Vincenty‘s.inverse equation: The HA considers а MN 
` as MOVE: NODE- in: casenits, distance, from the neighboring 
_MN crosses: a; threshold. The , threshold - distance: is: computed 
during simylation as discussed. in section 3.1.2. As.soon as HA 
detects such a, MN, jt:searches the, Route, attribute of all the 
¿recordsin the- session; table. It ‘selects, е -record(s) whose 
‘Route «attribute ~- contains othe identification. ; ofr ithe 
МОХЕ МОРЕ. Е found it retrieves the selected; record(s). It 
executes route selection algorithm for. the: selection оѓ:ап 

alternative optimal .route(s). before the, existing: rotite(s) fails 
completely. Such advance. selection of an alternative optimal 


SEC 
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„route helps to reduce:packet.loss of a session. The HA updates 


кеменче сн урук the ou Toute шикне Dy (be 
new route attribute in the session table: s „по. - 5 

The installation:of Google Map ‘along with. GPS intact die 
cost of the system. Moreover the GPS may not be able to work 
properly in sitüations' such às underwater conditions ер. within 
submarines. In such "A: situation radio detection and ranging 
(RADAR) works well. .Тһе , RADAR. .POSANT':. routing 
algorithm is сонны for discussion in section 2.2. 


= ‘RDC AS che БАЮУГА ERR 


. 22 RABAR: POSANT ROUTING ALGORITHM : 


Each ММ is‘equipped with two. antennas, ‘one at. the at е 
- and one:at the rear etd of MN: Both the antenna сап work as 


transmitter as well- av::receiver «to ‘achieve bidirectional 
: transmission-of packets corresponding to.a particular session. 


: Опе ‚of: Ње MNs.. works, asya. “special - fixed: node {SFN} It 
“maintains; route aun and, is: hot. taking: Р in 
communication. : lI CP ae aa NED а 

The S, id. prises route, selection саши. (as. РЕ in 
-séction 2.2: 1). by forwarding, ant, packet [11] towards D. id for 
the ,initiatiot оѓ га:.ѕезбот as: in- basic. POSANT routing 
algorithm. ‘The D. id sélects an. optimal route- and: sends ће 
„optimal route to, SEN;using: D :to. SFEN.message (às: discussed 
tin<gection 2.2.2).:The SEN: sends. the «optimal route'to S. id 
using SEN: tocS: message (äs discussed in section'2.2.2). The 
-SFN: assigns a unique SS, id to each session after receiving the 
optimal.zoute .from.D іф After receiving SEN. to S «message 
S_id generates: ТО "(as xliscusséd ‘in section 2.1.3). The route 
‘field: (Route) обро -SFN message, SEN го S: message. and 
TO contain. the:identification of all MNs which are associated 
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with the. optimal route- The 5:10 sends TO to Did through all 
MNs which are identified in.Route of TO. Each MN maintains a 
routing table (as discussed in section-2.1.4)-and: inserts a record 


in: the routing table after receiving a ТО. Both 5 id and-D-id - 


associated with -a particular session generate ‘TL (as discussed 


in section;2.1:3),and send: T17to'eacb. other ‘to’maintain the ' 
bidirectional’: transmission. of packets‘ corresponding to а 


particular session among them using {һе optimal ' route :ав 
mentioned in SEN, to Sinessage: The SFN maintains a ‘session 
table (as discussed іп séction 2.1:5):tol store the information of 
all the ongoing sessions among MNs in MANET. The SFN 


inserts a record in the session table after receiving D. to: SFN . 


message. As soon as an ongoing session is over 5 id associated 
with.this session, sends SOM (as-discussed in section 2.2.2) to 
SEN. The- SEN searches the-session, tableifor tbe record who's 


associgted withran existing: route ‘executes route maintenance 
algorithm (as discussed іліѕесіоп 2:2.3):to detect. whether .its 


neighboring MN associated with the same route is going out of. 
the communication range during the ongoing session and sends ` 
an, alarming, signal to. the .neighboring MN. Im response. the ' 


neighboring MN. sends its identification to. SFN..In such a case 


the, SFN considers the existing: route .as ‘faulty and: sends. 


SFN, АРТ ROUTE message (as discussed. in. section 2.2:2):to 


S id which is associated with the faulty route for the execution - 


of , the route. selection - algorithm: :The ‘Sid executes route 


selection algorithm for the. selection ofvan alternative optimal. 
route to replace the faulty existing route.. Aftér selecting the ' 
alternative optimal:route"S: id generates. T2 (as discussed in 


section 2.1,3). The N. Route of T2 contains the identification of 


all MNs which are. associated with the. alternative optimal route - 
as selected by..S_id. The. S-id sends Т2 to D. id through all ° 
MNs: which, are. identified im N. Route of Т2. for ame 


школ ar modification in ond eee jx 


ELE 


221 ROUTE SELECTION ‘ALGORITHM: ` 


The S ; id forwards the ant peckbt through all the possible routes i 
between S id and;D. id associated with a:particülar session as . 
in basic POSANT routing algorithm. The ant packet deposits ` 


pheromone value to each link. The maximum pheromone value 


is deposited to dic link having ‘smallest length. The-cint Pim 
UAR Е -and:c = 3% 10° meter/sec, [14,15,16,17]. The parameter R 





has,6 elds as:shown in Fig. „аза 


rigore Format ofant packet D 


Done. deaur Eo сө Ju. JA ur dme 


The. A: LB pal scrobe See ob Mie DuC RUE 


Let i® MN receives an’ ant packes from k^ MN and ј? MN is^ 


the successor ofi MN, The.i® MN mentions the cutrent time 


stamp in the’ T S field of the ant packet and: forwards it'to j* 
‘MN:adds its identification inthe Route field of the 
ant packet. The i* MN. computes the ‘difference‘in time: stamp: 


MN. The. i 


(Diff time). between: the- current time stamp -cortesponding to 


the time.of receiving the ant packet-by it-and the time starûp’in’ 


the T. S field of the ant рас ав mentioned by kî MN. The i? 
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MN also computes its distance from k^ MN (Dy) by 
multiplying Diff time and the speed of electromagnetic signal 
(m/sec). The bit eiror rate increases rapidly when the distance 
between the two neighboring MNs in the WLAN environment 
is greater than 45 meters [12]. So in the present work the 
pheromone value of the link between the і MN and the k^ MN 
(P_value,) is assumed as 20 (any value >] shows the identical 
performance) if Dy<45 otherwise it is assumed as 1. The ® 
MN also multiplies the value in the P_C field of the ant packet 
as mentioned by k? MN. by, P. value. At i? ММ (ће value in 
P C field of the ant packet indicates the pheromone 
concentration of the route from S. id up to i? MN. 

The D id:receives multiple ant packets through all possible 
routes. between source. and. destination. It compares the Р C 


' value of al] the received ant packets. The route field in the ant 
SS id attribute matches with the SS -id field as mentioned in . 
SOM: and deletes that record ‘from the sessibh ‘table. Each MN. - 


packet having maximum P_C value is selected as the optimal 
route. 


2.2.2 MESSAGE EXCHANGE AMONG VARIOUS MNs 
D.to SEN: message has S id, SEN id ‘and Route fields. 
SFN id field indicates the identification of SFN. SOM contains 
58. 10 -and Е. Пар.- SFN to S message contains 59 id, S id, 
SEN id and: Route fields: The SFN ALT, ROUTE message 
contains S. id,SEN id and SS id fields. 


22.3 ROUTE MAINTENANCE ALGORITHM 
Each.MN associated , with .an .existing route computes its 
distance from its neighboring MN which is associated with the 
same route using. mono-static equation [13]. The mono-static 
шошо Dy Е RAE scantennds ша ШЕЕ e jas 
pid 
-10 logol PG.GAd RS] 
a logio[ GIG. (oc (4x) fF RJ] 
where, P, « Received peak power 
P, = Transmitted peak power 
‚С, = Gain of transmitter antenna (dBi ) 
- G, = Gain of receiver antenna (dBi ) 
. A= Transmitted wavelength (m, cm, in, etc.) 
o = Radar cross-section of target - RCS (m°, cm’, in’, 
etc.) R = Range (m, cm, in, etc.), c = speed of light 
The parameter values of the mono-static equation are assumed 
as follows: P, = 20 dbm, G= G, =16 dbi, А = 15 cm, o = 2.5 m? 


indicates the distance between the two neighboring MNs..P, is 


‚ measured at the receiving aritennd and Ris computed using the 
тш шай quA o pea tie Known value of all the other 


parameters. . 

- Each MN: associated with. an. existing route also computes its 
' angle with its neighboring MN which is associated with the 
‚вате route, using Pythagoras theorem. In. AABC (Fig.2) the 


vertex B and the vertex C represent the-location of the front end 
and rare end antenna: in a: MN. The vertex A represents the 
location of.the neighboring: MN: In ААВС the side AB (=c) 
represents the distance between the neighboring MN and the 


` front end antenna. P, is measured at the front end antenna and c 


is computed using. mono-static equation. The side AC (=b) 
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rare end antenna. P, is measured at the rare end antenna and b is 
computed using mono-static equation. The side BC (=a) 
represents the length af MN. AP (=h) is perpendicular to BC. 


B 


P a e ' 
Figure.2: Triangular representation of the angle calculation 
process. 


The between the two neighboring MNs (angle C) is C= 
coro so c deh) aang Fy ыш AMNEM 
an alarming signal to its neighboring 
О хлор шй ы-ы 
by the Pythagoras theorem in case its distance from the 
RECEIVED МОРЕ crosses a threshold. The threshold 
distance is during simulation as discussed in section 
3.1.2. The RECEIVED NODE sends its Node, id to SEN. The 
SEN searches the Route attribute of all the records in the 
session table. It selects the record(s) whose Route attribute 
. contains the identification of the RECEIVED NODE. If found 
it retrieves the selected record(s) and sends 
SFN_ALT_ROUTE message to S_id(s) associated with the 
selected record(s) to execute route selection algorithm for the 
selection of an altemative optimal route(s) before the existing 
route(s) fails completely. After the selection of the alternative 
optimal route by D id, SEN receives D to SEN message and 
updates the selected record(s) by replacing the old route 
attribute by the new route attribute corresponding to the 
alternative optimal route in the session table. 


2.3 COMPARISON OF ROUTING ALGORITHMS 

In this section the basic POSANT routing algorithm [11), HA 
algorithm are compared on the basis of storage requirement, 
кшш DAS; маана “ec and Пе: рК oue 
algorithm. 


2.3.1 STORAGE REQUIREMENT 

In basic POSANT routing algorithm cach MN maintains a 
forward routing table to send packets from source MN to 
destination MN and a backward routing table to send packets 
from destination MN to source MN. Each record in the routing 
table has 3 attributes as shown in TABLE-3. Let TABLE-3 is 
the forward routing table at j MN. The Node, Address 
attribute is the address of the destination MN in case of forward 
routing table. The Next, Hop attribute is the address of the next 
hop MN from j* MN towards destination which is identified by 
the Node Address attribute. The Pheromone Value attribute 
indicates the pheromone value corresponding to the next hop 
MN which is indicated by the Next Hop attribute. The 
Node, Address attribute and the Next, Hop attribute are 128 bit 
IPv6 address. The maximum pheromone value which is 
deposited to a link is 20 as discussed in the section 2.2.1 and 
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the number of bits require to represent the maximum 
pheromone value is 5. So the length of each record in the 
forward routing table at any MN is 261 bits. The number of 
records in the forward routing table at j MN for a single 
session depends upon the number of possible next hop MNs 
from j* MN towards destination. So the storage requirement 
о gere d edat оо of possible next hop 
bits. 





Let TABLE-3 is the backward routing table at j* MN. The 
Node Address attribute is the address of source MN in case of 
backward routing table. The Next, Hop attribute is the address 
of the next hop MN from j* MN towards source which is 
identified by the Node Address attribute. The number of 
records in the backward routing table at ]° MN for a single 
session | depends upon the number of possible next hop MNs 
from j* MN towards source. Tbe storage requirement per 
backward routing table is (261*number of possible next hop 
towards source MN) bits. So the storage requirement for each 
bidirectional session is 261*(number of possible next hop 
towards destination -- number of possible next hop towards 
source) bits. 

In HA POSANT routing algorithm and RADAR POSANT 
routing algorithm each MN maintains a single routing table as 
shown in TABLE-1. The S. id, D id, SN NH and DN. NH are 
128 bits IPv6 addresses. Now for 1000 number of different 
bidirectional sessions the number of bits requires to represent 
SS id is 10. So the length of each record in the routing table is 
522 bits. There is a single record for each bidirectional session 
in the routing table and so the storage requirement for each 
bidirectional session is 522 bits. The storage requirement for 
each bidirectional session in basic POSANT routing algorithm 
is greater than that in HA POSANT routing algorithm and 
RADAR POSANT routing algorithm if the number of next hop 
MNs from jî MN towards source or destination is greater than 
unity in TABLE-3. 


2.32 ROUTING TABLE SEARCHING TIME 

Let in case of basic POSANT routing algorithm the number of 
forward ongoing session through j^ MN as an intermediate MN 
ів m and the number of next hop from j^ MN towards 
destination is n. So at ј MN the forward routing table contains 
m*n number of records and the time complexity to select the 
desired record from the forward routing table is O(log;m*n). 
The j^ MN compares the pheromone value of all the n number 
of next hops and selects the optimal next hop having the 
maximum pheromone value. The link between j^ MN and the 
selected optimal next hop is considered as the optimal outgoing 
link towards destination. The time complexity to select the 
optimal outgoing link from the forward routing table at j* MN 
is O(n”). So the total time complexity at j^ MN for the selection 
of an optimal outgoing link is O(loggm*n--n?). 
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In case of HA POSANT routing algorithm and RADAR 
POSANT routing algorithm the routing table at j MN contains 
m number of records and the time complexity to select the 
desired record from the routing table is O(log;m). 

So tbe time complexity of searching the routing table is higher 
in basic POSANT routing algorithm than in HA POSANT 
routing algorithm and RADAR POSANT routing algorithm. 


2.3.3 TIME COMPLEXITY OF THE ALGORITHM 

In case of basic POSANT routing algorithm the routing table at 
each MN contains the possible next bop and their pheromone 
value. During the ongoing session the routing table at each MN 
is searched for the selection of an optimal outgoing link. In 
case of HA POSANT routing algorithm and RADAR POSANT 
routing algorithm the routing table at each MN contains the 
optimal route. During the ongoing session the routing table at 
cach MN is scarched for the optimal route. So the optimal route 
is selected during the ongoing session in basic POSANT 
routing algorithm which increases its time complexity than the 
HA POSANT routing algorithm and RADAR POSANT 
roüting algorithm. The time complexity of the HA POSANT 
routing algorithm is higher due to the time complexity of the 
depth first search than the time complexity of the RADAR 
POSANT routing algorithm 


3.0. SIMULATION 

The simulation experiment is performed in two different 
phases. The performance of the basic POSANT [11] routing 
algorithm, HA POSANT routing algorithm and RADAR 
POSANT routing algorithm are compared in Phase 1. The 
performance of the integrated heterogeneous environment has 
been studied in Phase 2. The simulation experiment is 
conduced for 1280 number of packets and 6 numbers of MNs 
in both the phases. 


3.1 EXPERIMENTAL RESULTS FOR PHASE 1 
The simulation experiment is conducted to compare the 
performance of tbe three routing algorithms for MANET. 


3.1.1 INITIAL PATH SET UP TIME 

It is the time to set up an optimal route for the initiation of a 
session. Fig.3 shows the plot of initial path set up time for all 
the three routing algorithms. The basic POSANT routing 
algorithm needs the transmission of forward ant packets and 
backward ant packets for route selection. The optimal outgoing 
link corresponding to the optimal route is decided from the 
pheromone value in the ant packets. The HA POSANT routing 
algorithm needs the transmission of RRM and RFM among 
MNs for route selection instead of the transmission of forward 
ant packets and backward ant packets. The transmission of 
RRM and RFM select the optimal route instead of the optimal 
outgoing link which reduces the initial path set up time of HA 
POSANT routing algorithm than basic POSANT routing 
‘algorithm. The RADAR POSANT routing algorithm needs the 
transmission of forward ant packets for initial route selection 
which increases the initial path set up time of RADAR 
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POSANT routing algorithm than HA POSANT routing 
algorithm. But the transmission of backward ant packets is not 
_ required in RADAR POSANT routing algorithm which reduces 
the initial path set up time of RADAR POSANT routing 
algorithm than basic POSANT routing algorithm. 


3.12 AVERAGE PACKET DELAY 

Fig.4 shows the plot of average packet delay vs. simulation 

time for all the three routing algorithms. It can be observed 

from Fig.4 that the average packet delay is higher in basic 

POSANT routing algorithm as it selects the optimal route 

during the ongoing session than the other two routing 
Ийип». 
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Figure 3: Initial path set up times 

Fig.5 shows the-plot of average packet delay vs. the number of 
packets received for all the three routing algorithms. The speed 
of MN is assumed as 6 km/hr [18,19,20]. If a MN associated 
with the optimal route of a particular session starts to move in 
the opposite direction of another MN associated with the same 
route, their relative velocity becomes 12 km/hr. The 
communication range of WLAN is assumed as 100 m [21]. So 
the failure occurs in the existing route when the two 
neighbouring MNs associated with the same route go out of the 
communication range with relative velocity 12 km/hr after 30 
sec. It can be observed from Fig.3 that the initial path set up 
time for HA POSANT routing algorithm is 120 msec and for 
RADAR POSANT routing algorithm is 150 msec. The two 
neighbouring MNs having relative velocity 12 km/hr covers a 
distance of 0.4 m (= 1 m) in 120 msec for HA POSANT 
routing algorithm and 5 m («1 m) in 150 msec for RADAR 
POSANT routing algorithm. So the packet loss and average 
packet delay of an ongoing session can be minimized by 
triggering tbe route maintenance algorithm in advance when 
the two neighbouring MNs associated with the same optimal 
route аге at a threshold distance of 99 m (100 т-1 m) from 
each other. During simulation it has been observed that the 
time requires to transmit a single packet using basic POSANT 
routing algorithm is 40 msec whereas the time requires for 
transmiting a single packet using HA POSANT routing 
algorithm and RADAR POSANT routing algorithm 18 30 msec. 
So the number of packets that can be transmitted using basic 
POSANT routing algorithm in 30 sec is 700 whereas the 
number of packets that can be transmitted using HA POSANT 
routing algorithm and RADAR POSANT routing algorithm in 
30 sec is 950 before the failure occurs in the existing route. 

It can be observed from Fig.5 that the initial average packet 
delay is higher in basic POSANT routing algorithm due to its 
higher initial path set up time as discussed in section 3.1.1 than 
the other two routing algorithms. The new route is selected in 
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basic POSANT routing algorithm after the transmission of 700- 
packets.. The new route is selected іп HA POSANT -routing, 
algorithm and RADAR РОЅАМТ. routing algorithm after the, 
transmission of 950 packets. The average packet delay in the 
new route for basic POSANT routing algorithm is also higher 
NE SS ES а р 

















Figure.5: Average packet delay vs. Number of packets received 


3.1.3 PERCENTAGE OF ‘SUCCESSFULLY DELIVERED, 


PACKETS: < 
TABLE- shows Не percentage. of коле delivered 


packets for the 3 routing algorithms. The new route discovery, 
process starts after the failure occurs in thé existing route in’ 


basic POSANT routing algorithm. So the data packets thaf are 
generated during the time interval between the occurrence of 


route failure and finding out a new route ate lost. The route ' 


maintenance’ algorithm selects dn alternative optimal route in 
advance before the failure occurs in the existi route in HA 
POSANT routing alg ‘and RADAR POSANT ‘routing 


algorithm. So the percentage of successfully delivered packets 


Besse in Bic ыйы rane arate Me Os ки. 
: . scheme is 60%. 





3.2 EXPERIMENTAL RESULTS FOR PHASE 2 : 
The simulation experiment is conducted to evaluate . the 
li digo (a d ee | Siig ‚ 
B E so cocLbsion Doi d LAN ia i 
network. It maintsins a НА to select an optimal network 
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3.2.1 PATH SET UP TIME . 
Fig.6 shows the peth set up time for ail tho networks’in the 
integrated heterogeneous environment. The path set up-time. in 
the cellular network and in WLAN is higher due to. the 
infrastructure, access. overhead than the, path set up, time in 
MANET. The path set up.time in cellular-network and WLAN 
are identical as they are using the same route, selection 
algorithm [8]. 

EN ient ubce ue Xf 
3.2.2 AVERAGE PACKET, DELAY „бө ә 
Fig.7 shows the plot of average packet деу. va. simulation 
time for all the three. networks. in the integrated network 
environment, The, average packet delay is lesser.in MANET 
due to'its lesser path set up time.- The average packet delay of 
WLAN. and cellular network. is slightly higher, due.to higher , 
path set up time and ће overhead of executing the.route 
sclection algorithm [8] than MANET: But the average packet 
delay of WLAN is lesser due. to its high speed than cellular 
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Figure.7: Average packet delay vs; simulation e ^ 
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323 SESSION BLOCKING PROBABILITY | ^ ©". 


‘ Fig.8 shows the plot of session blocking probability vs. the 


number of sessions of the proposed scheme:-It increases with 
the umber of sessions. The blócking probability in [22] is 90. ' 


The maximum sessibn ‘blocking probability’ of the- proposed 


1, Muir ber ofseaalone ` 


Figure 8: Session З probability v V8. mune of sessions 
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depending upon the type of session. The route selection 
algorithm is proposed for all the three networks. The 
performances of the proposed scheme are evaluated 
considering only the data class of traffic. It can be extended by 
considering other traffic classes during simulation. 
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telecommunication 

complex, and consequently, interest in developing broadband 
integrated service- of digital. network technologies like 
Asynchronous Transfer Mode (АТМ) and. Wireless ATM 
(WATM).are gaining тотейит. The changing traffic pattern 
and the -ħew technologies used in ATM networks make the 
topological" design of ATM ‘network a major research: issue. 
Most of. the researchers dealt with the topological design 
problem suggested solutions based on requirement of expensive 
exchange based equipments. In this paper we have proposed a 
cost effective’ ATM network model The design of ATM 
networks entail optimization of the network: We.have proposed 
an Enhanced Genetic Algorithm (GA) based solution for the 
optimization of ATM network. The results of the study show a 
к ONSET E Eu sao ае ша ура C 
oye Simple GE DU 


KEYWORDS . А0. 
Аааа Feinster Моде Passive Optical Network, | 
сы (GA) пае JEDE 

uf lt E 
10 INTRODUCTION 20044 
.ATM. is a packet sheeted Counseling’ ойе МЕ mode 
based-on asynchronóus time division multiplexing. ATM is 
considered. to^. reduce е complexity of the betwork' and 
‘improve the flexibility. of. traffic performance [Raychaudhuri 
and Wilson; 1994]..In ATM; information is‘sent-out in fixed- 
size cells: Each сеНіп ATM consists.of 53 bytes: Out of these 
53 bytes, 5 bytes. are reserved for the header field. and 48 bytes 
are reserved": for -data field: АТМ is “Asynchronous as the 
recurrence ‘of cells. ‘sent -by . an individual’ user máy not 
necessarily ‘be periodic. ATM integrates the multiplexing and 
switching::: functions: - and. allows!. communication between 
devices that operate at different speeds [P. Wong and D. 
Britland, 1993].: The objective of ATM network planning is to 
design the network‘structure to сапу the estimated traffic and 
also*to' minimize the cost.of.network. [Gerla, 1989, Gerla, 
Kleinrock;:1977, 'Routray. et: al.; 2006]. Over the last decade, 
many programming models have been developed [Kim et. al., 
1995; Minóux;J1987] which deals with telecommunication 
network planning'[Liu, 2003]. A large number of network 
optimization ‘problem ‘do not have any standard algorithm that 
е рн a ЗАНЕ based on the 


different constraints.’ As the models for the design of ATM 
networks are quite complex, and involve generally a very large 
number of integer and continuous variable, meta-heuristics like 
simulated annealing [Rios et. al.,-2005] and GA has been used 
to solve the design problem [Routray et. aL, 2005, Davis et. al. 
1993, Davis ot. al, 1987]. Abuali et. al. (1994) présent a GA 
based -algorithm for the capacitated concentrator location 
problem and develop a permutation-based representation. The 
resulting algorithm out-performed a greedy heuristic on larger 
problems. Elbaum & Sidi, 1995- consider the problem of 
designing local area computer networks which corresponds to 
tbe .minimum. spanning concentrator’ location problem. 
Chardaire et al. (1995) also use a GA and apply it to 
uncapacitated and capecitated versions of SS-CLP. The paper 
does not describe how they assign end-users if there sre 
capacity constraints. For uncapacitated problems [Balakrishnan 
et. aL, 1989], LR finds better solutions than the GA. However, 
when tested against capacitated problems, a GA combined with 
TOL Se SOL petionis moce eonsisedty ШКЕ EAE EE 
of problems. 

ATM network planning deals' with determination of location 
for the switches and linking the switches [Hasslinger et. al., 
2005! One of the limiting’ factors in the design of the ATM 
netwuıa as сап Бе deduced from the literatures cited is the 
requirement of expensive exchange based equipments. Passive 
Optical Network is a solution to the problem. It provides a way 
to gradually introduce fiber optic technology into access 
networks while still: deploying parts of the traditional copper 
line or co-axial cable systems. These networks allow many 
different configuration options and as such will place new 
demands on network planners. Most of the literatures available 
with respect to PON АТМ” pertain to'the Steiner tree topology 
implementation: In “this ‘paper we bave addressed the 
comprehensive ATM -network planning problem -which deals 
with the backbone network design using the ring topology. 
Ring ‘architecture is considered cost effective in that they offer 
high network survivability in the- face of node failure and 
greater bandwidth sharing [Wu, et. al, 1998]. And also the 
problem of end-user connectivity with the | ackbone network 
has-been addressed. - 


2.0 GENETIC ALGORITHM 

GA is a non-traditional based optimizing technique [Goldberg, 
1991] which can be used to optimize the ATM network. GA 
operations [Srinivas et. al., 1994] can be briefly described as 
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chromosomes constitute the population, the size of which is 
equal to the random number of chromosomes. Evaluation- 
Bach chromosome in the population is assigned a specific value 


have different fitness values. Reproduction is to increase the 
number of the good chromosomes and decrease the number of 
the poor chromosomes in the next generation. Crossover-This 
procedure exchanges genes between the father and the mother 
chromosomes. Two chromosomes are randomly selected from 
the population as parent chromosomes. The crossover points 
are chosen to be less than the number of genes in the 
chromosome and then the genes are swapped between the 
crossover points. Two new chromosomes with the genes from 
both the parent chromosomes are obtained. This procedure is 
called two-point crossover. Mutation-In order to have a new 
chromosome which differs from the chromosomes in the 
population, a mutation operation is used. A chromosome is 
randomly selected as the mutated chromosome. The mutating 
gene is randomly selected from the number of genes in the 
mutating chromosome and then the value of this gene is flipped 
into another value. The operation repeats until the variation of 
the mean fitness of the population is very small. Finally, the 
best chromosome in the population is decoded as the solution 
of the optimization problem. GA has been used in previous 
studies with a different perspective and in parts to design ATM 
network, to optimize the bandwidth [Thompson , 2000, G. 
Carello et. al, 2003, Routray et. aL, 2006]. Comprehensive 
ATM network planning problem using meta-heuristics has not 
been dealt with. 
Genetic algorithms are based on evolution of genes. GÀ do not 
take into consideration the learning generated by cultural 
evolution. One of the limitations in GA based technique is 
quick convergence from local optima. Enhanced GA can be 
used tó overcome this limitation. In Enhanced GA local search 
algorithms are implemented in the steps of GA to generate 
better solutions. The local search algorithm that has been 
considered in this paper is Hill climbing algorithm. In hill 
climbing the basic idea is to always head towards a state which 
is better than the current onc. If such states are available the 
i searches for those states and if there are no such 
states available then the algorithm terminates. 


3.0 PROBLEM DESCRIPTION 

While planning ATM network there are two sets of customers 
to be considered, the user who would be using the services 
through the network and the company that will be building the 
ATM network and maintaining it. Therefore while planning the 
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ATM networks there are two principal objectives to be 
considered. One, the network should meet the end-users needs 
in terms of quality of service and cost. Two, for the network 
operator it should be as cost effective as possible to install and 
maintain the network. The second objective has traditionally 
been examined as reducing the first installed cost of the 
network. Minimizing the total cost is mainly a matter of finding 
shortest paths between the ATM nodes, as in installing a new 
network most of the money is spent on digging the cable ducts. 
ер 





Figure 1: Schematic for a ATM planning 


PON ATMs can be implemented in several topologies. One 
such configuration is a ring structure where the OLT (Optical 
Line Termination) in the central office can be seen as the root 
and the ONU (Optical Network Units) as the nodes in the ring. 
Customer access points are connected to the ONU in a star 
topology [en.wikipedia.org/wiki/GPON]. These devices take 
an optical fiber as input and split the signal carried on this fiber 
over a number of fibers on the output. Signal attenuation 
constraints require that the signal is only split at a maximum of 
two points between the exchange and customer. The first 
splitting point in the network is called the primary node. The 
second point at which the signal is split is called the secondary 
node. Typically 32 ONU's [en.wikipedia.org/wiki/GPON] can 
be connected to one OLT. The diagram [fig.1] shows a ring of 
fiber connecting the primary nodes and tbe method of 
connecting the end-users to these primary nodes. In this paper 
we have considered the case where there is a single connection 
from the primary to secondary node and from the secondary 
node“ to customer. This is likely to be the most common 
installation strategy for the ATM network as back-up links are 
very expensive. 

When installing a new network in the access area, the majority 
of money has to be spent on digging the cable ducts. Thus, 
minimizing the total cost is mainly a matter of finding the 
shortest street paths which interconnect all ONUs with the 
ОГТ. “A city map can be represented by a graph where the 
streets are the links, and the street junctions together with the 
ONUS and the OLT make up the nodes. In this paper we have 
taken the location of the exchange, the location of potential 
end-users, and a forecast of these end-users’ demand in terms 
of number of lines and year as given. Variables being - Primary 
and secondary node locations, cable sizes and routes, Duct 
capacity and routes ,assignment of end-users to secondary 
nodes ,assignment of secondary nodes to primary nodes. The 
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Beo sane be mened subject to the constraints of 
attenuation, maximum distance between a node and a customer 
and planning rules. The aim of the planner is to satisfy both the 
network’s end-users and the network operator, by producing a 
reliable cost-effective network. 

Objective: The objective of the optimization is to install a 
minimum net present cost network that satisfies the customer 
demand criterion. Let the graph 0= (У, E) be a sct'of V nodes; 
V= (1,...,n) and a set of E customers as edges; E = (1,...,m]. 
The objective fonction [ Kratica] used to optimize the 


backbone network has been taken as: 
ES dere Sfx. 


Objective function = ШИР fal 
1] 


on [2] 


: when end node is connected to concentrator j; 


yi =1 : when secondary node is established else 0 
fi = cost of secondary node connected to primary node 


,'2 67 Ta BO Yay 


= ml [3] 


Where, 

xi, yi = co-ordinates of the ATM nodes 

Along with the objective function - to optimize the time at 
which cable is installed into the network and to create a 
network that uses, the above allocations, split levels and 
positioning, a heuristic method has been used to achieve the 
installation strategy [Routray et. al.]. 





4.0 METHODOLOGY 

The integer value is assigned to the respective link as a pseudo 
link weight which is not correlated to the real cost value of this 
edge. The pseudo link weights are only auxiliary parameters. 
The fitness has been calculated based on objective function 
given in [Eq.1]. The position of the primary and secondary 
nodes and the associated split-levels can be represented using а 
simple bit string. An individual in the population is therefore a 
combination of two types of genome; a list for representing 
allocation and a bit string for representing split level and 
secondary and primary node positions. The two can be evolved 
in parallel and the fitness score of the individual depends on the 
performance of both the genomes. Thus the initial problem is 
solved wherein the primary nodes are optimally connected to 
the local exchange in the ring topology. 


Copy Right © BIJIT - 2011 Vol. 3 No. 1 ISSN 0973 - 5658 


4.1 ENCODING MECHANISM 


is therefore a combination of two types of genome; a list for 
allocation and a bit string for representing split 
be evolved in parallel and the fitness score of the individual 
depends on the performance of both the genomes. Thus the 
initial problem is solved wherein the primary nodes are 
optimally connected to the local exchange in the ring topology. 
The second stage is then to optimize the allocation of end-users 
to the secondary nodes and assigning secondary nodes to the 
primary nodes. For encoding the problem, the following 
methodology has been considered. There are m end-users and p 
secondary nodes a matrix of p*m is taken. A constant k is 
chosen based on the condition of fiber optics i.e. the maximum 
possible distance the signal can be transmitted without getting 
attenuated. Initially, the Configuration String (CS) is taken at 
random. The CS is created by the mechanism shown in fig. 2. 
CS follows the constraint that the distance between the end- 
node and the secondary node will be less than or equal to k. 
Also to optimize the time at which cable is installed into the 
network to create a network that uses the above allocations, 
Locations of 
switches 


O 
(в) 
© 
©) 
© Aak 


CE) 
Figure 2: Encoding mechanism 


A heuristic method has been used to achieve this installation 

strategy. Heuristic used is: 

4.1.1 Set year, y-0 

4.1.2 For each customer with demand in y, connect it to the 
secondary nodes to which it is assigned by the shortest 
route through the duct network. 


End-usersz 
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4.1.3 For each secondary ńode Connected in the previous 
' кер One toe primary ойе O ИВ 
', assigned. ` 

414 -: IE y isthe fini yeûr bf tle planning period m finish 


else increment y and go të 2. `- - 
This heuristic has been incindéd in tbe objective function of a 
‘genetic álgorithm so that iteration is not required between the 
‘two stages. Costing of the installation is-based’ Ош the net 
present worth of thie plant ii the year it is installed. E 


42 NETWORK OPTIMIZATION USING GA ДЕ 
Пре tolls ца Algeriin: iN OO foe пе backbone petere 
ee ee 
“while (not done) ~ "t 
- for ch p ii pop < we : 
Do ' p-fitness = 'evaliáte(p) - ^ 
gems Bart to Boe) bj 25 
Pie e #¥# select parents for reproduction ' 
уа ‘patent, peer select tivo random solutions from 
pop: ‘aa at - 2 A 
б ө [childl; cil] = estos (parin, parcio) 
'-' mutate child, child2  :' 55 
бе replace 61 population with ne population ' para 


оаа а 

The approach taken is to represent’ the Зе ee iid: an 
ordered list of customers. The first n customers from the list are 
assigned to-the first secondary node, the second n customers to 
the second node, etc. Fig 4.1 shows an example of this: the first 
primary node connects to the first four secondary nodes, which 
in turn connect to the first thirty-two customers in the list. This 
representation means that the GÀ cannot generate genomes that 
correspond to illegal network configurations. 


The selection: mechanism: chosen is the. Roulette . wheel 
selection. In roulette wheel selection individuals are-assigned a 
probability of being selected based on their fitness, pi = fi / 
У, Where :рі is the probability that individual i will be 
selected, fi is the fitness of individual i, and Xfj represents the 
sum of the fitness of all individuals in the population. Similar 
to using a roulette wheel, fitness ofan individual is represented 
as proportionate slice of wheel. Wheel is then spun and the 
E ee er йиш кыр рын gee 
dividon Decore a parait "Dr 


4.2.3 Crossover 

Two standard crossover operators. are chosen for manipulating 
the above representation. These are the edge recombination 
crossover and: the partial match crossover [Goldberg, 1991]. 
di p aaa EE 


4.2.4 Mutation . 1, fe prever DE 
New аа ИК the mutation operator. 
The values of individual genes -are changed and, hénce, new 
solutions are chosen. Mutation becomes important when after 
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‘some ‘genénitiots the nüthber of diffeént strings decreases 


becanse strong individuals’ start dominating. -In a ‘situation of 
strong dominance of a few stings, the crossóver operátor alone 
would not. bring any changes and the-search für an optimal 
solution would be ended. To partiàlly shift te searcií to new 


“locations in ће solution space; the mutation’ “operator rahidomly 


alters ‘genes. A’ mutation rate of 0.01 was taken for GA. The 
number of genérations сыш on De ctun wae) | 


4.2.5 Teriùinating Соза ` : ‚з С 
The terminating condition has bai eoi ud Ê ê 
1500 generations. | um 
4.2.6 Enhanced GA 








Figure 3: Flowchart of Enhanced GA vM 


m Bopdlstior башда AE di аа 
and then local ‘search: techniqué namely Hill Climbing 
кш E E dd ee Фу 


5.0 EXPERIMENTAL RESULTS E PEE 


‘Enhanced Genetic’ algorithm’ has. been: used’ to’ find ‘out’ an 


optimum conrection using“ring topology to connect the “ATM 
nodes. and''to find end user: connectivity: Numbers’ of 
experiments were conducted:with varying populatioti size. For 
all the. experiments the résülts were recordéd aftér''a’ fixed 
number -of .500 generations’ іп our experithental'data.. The 


‘objective’ function ‘in .[Eq..-1] has :Бееп ‘considered.’ ‘The 


crossover rate of 0.6 and'á mutation rate of. 0.01: have been 
considered for the base GA. These parameters were established 
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empirically from a series of test runs. The graph [Fig. 4] shows 
the average normalized cost of the best individual in the 


phase with 50 ATM nodes it has been observed, the solutions 


ашы ру Какей GA Ste persto Шаш, tons solem 
obtained by GA. 











Generate initial solution 
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Generate next state 
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Generations 





The comparison chart between GA & Enhanced GA for the 
Best cost average for 50 nodes. The cost of network design 
using Enhanced GA was 2986.53, which is better than GA. 
Also the time required to generate the solutions by Enhanced 
GA was much leaser than the time required by GA. It was also 
observed that with a smaller network size GA performed better 
than Enhanced GA but as the network size increased Enhanced 
GA performed better than GA (Table 1 & 2). The graph (Fig. 
5) shows the average normalized cost of the best individual in 
the population at each generation for each operator. In the first 
phase with 50 ATM nodes it has been observed, the solutions 
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obtained. by Enhanced GA wei better. ане: вошта 
obtained 





Table 1: GA and EGA comperison for connecting primary 
nodes in ring topology 





Table 2: GA and EGA comparison for nétwork design 


It was also observed that the time required to generate the 
solutions by Enhanced GA was much lesser than the time 
required by GA. In some cases GA gave better results than 
Enhanced GA but the time required was very high in GA. In all 
the cases it was observed that GA was slower than Enhanced 


GA. 

The allocation of end-users to secondary ‘and primary nodes 
can be treated as an ordering problem. The approach taken is to 
represent the problem using an ordered Hist of end-users. The 
first n end-users from the list are assigned to the first secondary 
node, the second n end-users to the second node, etc. Unlike 
many optimization techniques, Enhanced GA work effectively 
with discontinuous cost functions. The cost of assigning a 
customer to a node is calculated by finding the shortest path 
from the customer through the network of ducts to the node. 
The constraint that has been considered for assigning the end- 
users to the secondary nodes is that no more than 8 end-users 
can be connected to a single secondary node. The best results 
are shown for end-user networks using Enhanced GA [fig. 6]. 


In the figures a network with 100 .end-users has been 
considered. It can be observed from the resultant network,.the 
majority of the nodes in the network obtained by Enhanced GA 
supply nearby clusters of end-users. The time taken by GA is 
considerably higher than the time required by Enhanced GA. 


So for this specific problem of connecting the end-users with 


r 


X 
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GA. 





Figure 6: End-users connected to secondary nodes and 
secondary nodes connected to primary nodes in star topology 
using Enhanced GA. 


6.0 CONCLUSION 
An Enhanced GA based optimization system for ATM network 
bas been designed, implemented and tested. In this paper we 


have designed an ATM network using Enhanced Genetic: 


Algorithm approach. Considering the strategic and financial 
implications for communications providers, cost is very 
important factor in network planning. So it is very important 
that fiber networks are implemented in a cost-effective manner. 
Minimizing the total cost is mainly a matter of finding shortest 
paths between the ATM nodes, as in installing a new network 
most of the money is spent on digging the cable ducts. In this 
paper we have found the optimal paths to connect the primary 
nodes in the ring topology and also connected the end-users 
optimally with the secondary nodes in a star network and then 
the secondary nodes are connected to the nearest primary node. 
We have firstly used Enhanced GA to connect the primary 
nodes in ring topology and have then connected the end-users 
to the secondary nodes in star topology using Enhanced GA. 
As the results demonstrates that a Enhanced GA based 
optimization approach to network planning produces good 
network plans as compared to simple GA based approech 
networks. An optimization system such as the one described 
here will enable a planner to evaluate a large number of 
scenarios under different conditions. 
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ABSTRACT 

The cost associated with development of a large and complex 
software system is formidable. In today's customer driven 
market, improvement of quality aspects in terms of reliability of 
the product is also gaining increased importance. Bui the 
resources are limited and the manager has to maneuver within 
a tight schedule. In order to meet these challenges, many 
organizations are making use of Commercial Off-The-Shelf 
(COTS) software. This paper develops a fuzzy muiti objective 
optimization model approach for selecting the optimal COTS 
software product among alternatives for each module in the 
development of modular software system. The problem is 
formulated for consensus recovery block fault tolerant scheme. 
In today's ever changing environment, it is arduous to estimate 
the precise cost and reliability of software. Therefore, we 
develop a fuzzy multi objective optimization models for 
selecting optimal COTS software products. Numerical 
illustrations are provided to demonstrate the models developed. 


KEYWORDS 
Modular software, software reliability, COTS products, fault 
tolerance, fuzzy optimization. 


1.0 INTRODUCTION 

In our modern society, computers are used in diverse areas for 
various applications, for example, air traffic control, nuclear 
reactors, aircraft, real time military, industrial process control, 
and hospital patient monitoring systems. As the functionality of 
computer operations becomes more essential and complicated 
apnd critical software operations becomes more essential and 
complicated and critical software applications increase in size 
and complexity, there is a greater need for computer software 
reliability. 

Software reliability is an important attribute of software 
quality, together with functionality, usability, performance, 
serviceability, capability, install ability, maintainability, and 
documentation. Software reliability is hard to achieve, because 
the complexity of software tends to be high. While any system 
with a high degree of complexity, including software, will be 
hard to reach a certain level of reliability, system developers 
tend to push complexity into the software layer, with the rapid 
growth of system size and case of doing so by upgrading the 


software. 


Commercial off-the-shelf (COTS) components engineering is 
an emerging paradigm for software development. Benefits of 
COTS besed development include significant reduction in the 
development cost, time and improvement in the dependability 
requirement. Commercial off-tbe-shelf (COTS) components are 
used without any code modification and inspection. The 
components, which are not available in the market or cannot be 
purchased economically, can be developed within the 
organization. Component Based Software Engineering (CBSE) 
process model has become a kind of process model of software 
development project [6,9] Respective developers of the 
components provide formation about their quality normally 
in terms of reliability. COTS components are received from 
distributor and are used ‘as is’. No changes are normally made 
to their source codes. Only the code that is necessary to 
integrate these products is required to be developed in house. 
Large software systems have modular structures. The 
advancement of technology has made the use of COTS 
products as modules a possibility. A component can now be 
chosen for a module from the number of alternatives available 
in the market. 

This paper proposes fuzzy multi objective optimization models 
for selecting the best COTS software product for each module. 
Software whose failure can have severe repercussions can be 
made fault tolerant through redundancy at module level [1]. 
Because of our present inability to produce error-free software, 
software fault tolerance is and will continue to be an important 
consideration in software systems. For some applications 
software safety is important and fault tolerance techniques used 
in those applications are aimed at preventing catastrophes. 
Multi version software fault tolerance techniques are based on 
the assumption that software built differently should fail 
differently and thus, if one of the redundant version fails, at 
least one of the others should provide an acceptable output. In 
[3, 4] reliability optimization problems for fault tolerant 
systems have been discussed. The authors have discussed two 
reliability models. In this paper, a fault tolerance architecture, 
which support consensus recovery block Scheme is proposed. 
In the existing research in this area it is assumed that a crisp or 
a constant value of all the parameters is known. Jha et al 
formulated bi-criteria optimization model for selection of 
COTS based software system for consensus recovery block 
scheme by taking crisp estimates of reliability and cost [5]. 
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However, in practice, it is not possible for a management to get 
precise value of reliability and cost for a software system. Or it 
may happen that they decide not to set precise levels due to the 


market considerations and are ready to have some tolerance of 


their objectives. When the precise values of parameter of the 
problem are not known, the problem becomes a fuzzy 
optimization problem and the solution so obtained is a fuzzy 
approximation. Gupta et. al proposed a hybrid approach for 
selecting the optimal COTS software product in the 
development of modular software system[8]. 

This paper proposes two fuzzy multi-objective optimization 
models for selecting the best COTS software product for each 


module. The first optimization model (optimization model-I) of 


this paper is a joint optimization problem that maximizes the 
system reliability with simultaneously minimizing cost. The 
second model (optimization model-II) considers 


the issue of compatibility between different alternatives of 


modules as it is observed that some COTS components cannot 
integrate with all the alternatives of another module. . We 
assume the existence of virtual versions, apart from available 
versions, having negligible reliabilities and zero costs. Virtual 
versions are chosen only when we have insufficient budget. In 
a situation where this particular version is chosen, the 
corresponding alternative is not to be added to the system. The 
rest of the paper is organized as follows. Section 2 proposes 
notations, In section 3, we develop a crisp model and describe 
non -linear S-shape fuzzy membership functions in respect of 
both the chosen objectives, viz. the reliability and the cost. In 
this section, we also present fuzzy multi-objective optimization 
models for selecting the best COTS product for each module. 
Section 4 paper are illustrated with numerical example. Section 
5, we furnish our concluding observations. 


2.0 NOTATIONS 
R: System quality measure 
A: Frequency of use, of function / 
: . Setof modules required for function | 
Reliability of module i 
Number of functions, the software is required to 
perform 
п: Number of modules in the software. 
Number of alternatives available for module i 
Number of versions available for alternative j of 
module і 
: _ Cost of version k of alternative j of module 
i (COTS) 
tı : Probability that next alternative is not invoked upon 
failure of the current alternative 
t2 : Probability that the correct result is judged wrong. - 
t3: X Probability that an incorrect result is accepted as 
correct. 


5| 
Ry: 
L: 
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Yy: Event that correct result of alternative j of module 
i is accepted. 

Xy: Event that output of alternative j of module i is 
rejected. ! 


nj: Reliability of alternative J of module i 
: Reliability of version k of alternative j of module i 


zy: Binary variable taking value 0 or 1 
1, if alternative j is present in module i в 
0, otherwise 


3.0 MULTI-OBJECTIVE OPTIMIZATION MODELS 
SELECTING COTS PRODUCTS 

Ti thls section, we боста COTS software pacity А 
problem as an optimization problem with multiple objectives. 
The first optimization model is developed for the following 
situations, which also holds good for the second model, but 


The following are the assumptions of optimization Models: 
3.0.1 There is a specified budget for the development of 


software system. 

3.02 А software system consists of a finite number of 
modules. 

3.0.3 A software system is required to perform a known 


number of functions. The program written for a 
function can call a series of modules (€ n). A failure 
occurs if a module fails to carry out an intended 


operation. 

3.0.8 Codes written for integration of modules don't contain 
any bug. 

3.0.5 Several alternatives are available for each module. 


Fault tolerant architecture is desired in the modules (it 
has to be within the specified budget). Independently 
developed. alternatives (primarily COTS components) 
are attached in the modules and work similar to the 
recovery block scheme discussed in [3,4]. 

The cost of an alternative is the development cost, if 
developed in house; otherwise it is the buying price 
for the COTS product. Reliability for all the 
ИО ИРЕ 


3.0.6 


of a module are available. 

Other than available cost-reliability versions of an 
alternative, we assume the existence of a virtual 
versions, which has a negligible reliability of 0.001 
and zero cost. These components are denoted by index 
ane in the third subscript of x, Cy and ry. for 


example r,, denotes the reliability of first version of 


3.0.8 


cue i apt ae ЕТИ 


49 


. alternatives j for modulei, 
3.1 MULTI-OBJECTIVE OPTIMIZATION MODEL I 
In the first optimization model it" is assumed that the 
alternatives of a module are in a consensus recovery block [10]. 
Consensus recovery block requires independent development 
of indepeóüdent alternatives of a program, which the COTS 
' components satisfy and a voting procedure. Upon invocation of 
the consensus recovery block all alternatives are executed and 
their outputs are submitted to a voting procedure. Since it is 
. assumed that there is no common fault, if two or more 
alternatives agree on one output then that output is designated 
-as correct. Otherwise the next stage is entered. At this stage the 
best version is examined by an acceptance test. If the output is 
accepted, it is treated as the correct one. However if the output 
“is not accepted, the next best version is subject to testing. This 
process continues until an acceptable output is found or all 


having the above 


а) 
(2) 


“XeS=(Xyis binary variable 


sad fhe e 0-0 o 


Es fie шу 4 2ا1‎ 
P(X, )= (1-e, (i-4)+ 54] 


P(r, )= r, (1-1) 


Ы 4 
che j-L2,. m mdi-l2, a (4) 


(6) 


Ya 2li212 — ‚п (7)) 

1=1 

. Objective function (1) maximizes the system quality (in terms 
of reliability) through a weighted function of module 
` reliabilities. Reliability of modules that are invoked more 


frequently during use is given higher weights. Analytic 
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Hierarchy Process (AHP) can be effectively used to calculate 
these weights and-(2) minimize the overall cost of the system. 
Constraint (3) estimates the reliability of modulei. As it has 
been assumed that the exception raising and control transfer 
programs work perfectly, a module fails if all attached 
alternatives fail. 


Constraint (5) ensures that exactly one version is chosen from 


each alternative of a module. Jt includes tbe possibility of 
choosing a dummy version. Equation (6) and (7) guarantee 
that not all chosen alternatives of module are dummies. 
Optimization mode-I is a 0-1 Bi-Criterion integer 
programming problem. An example is solved using software 


package LINGO. 
It is observed that some alternatives of a module may not be 


compatible with alternatives of another module. The next 
optimization model П addresses this problem. It is done, 
incorporating additional constraints in the optimization models. 
This constraint can be represented asx, S X, ., which 
means that if alternative sfor module g is chosen, then 
altemative u,, f = 1,....... .z have to be chosen for module A. 
We also assume that if two alternatives are compatible, then 


Хы ¬ Xin < МУ, 
4=2,....... Vp» c-72, is Me, 8 =I, wee m, (8) 


3y, =, -2) 9) 


Constraint (9) ensures that only one alternative is compatible. 


Constraint (3) to (7) is equivalent to problem (P1). Constraint 
(8) and (9) make use of binary variable y, to choose ons pair 
of alternatives from among different alternative pairs of 
modules. If more than one alternative compatible component is 
to be chosen for redundancy, constraint (9) can be relaxed as 


follows. 
Уу <, -2) 


Constraint (10) ensure more than one alternative is compatible. 


(10) 


3.2 MULTI-OBJECTIVE OPTIMIZATION MODEL II 
Problem (P1) can be transformed to another optimization 
problem using epe constraint as follows. 


Maximize = Dalle 
ja 
Minimize c-Y y Yen. 
ml } bel 
Subject to 
: ‚„Хє 5 
Хм — Xime S MY, 
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>», = (Vp, 22) 
>», < 2V 29) 


Similar constraints can be written for all pairs of compatible 
modules. 


33 SELECTION MODEL FOR COTS SOFTWARE 
PRODUCTS BASED ON FUZZY DECISION THEORY 
The model formulation for the above said problem requires an 
estimate of reliability and cost for various alternative COTS in 
the modules. Due to the changing environment, these estimates 
cannot be determined definitely because cost and reliability are 
affected by ambiguous and uncertain factors. which cannot be 
measured precisely. Also the decision maker’s assessment 
about these estimates may be based on incomplete knowledge 
about the COTS product itself and other aspects (c.g. vendor's 
eredentials).Under such conditions; making a decision based 
upon crisp model is not the best decision. Since software 
development cost is ever changing and it becomes difficult to 
estimate the definite cost and reliability of the software. 
Therefore the issue of selecting COTS software products 
becomes the one of a choice from a fuzzy set of 
suggestive of the diversity of both the decision maker's 
objective functions as well as that of the constraints. 
Therefore, we formulate fuzzy multi-objective optimization 
model for COTS software products selection based on vague 
aspiration levels, the decision maker may decide his aspiration 
levels on the basis of past experience and knowledge possessed 
by him. To express vague aspiration levels of the decision, 
various membership functions have been proposed [13, 14]. A 
fuzzy linear programming problem with non linear membership 
function results in a non linear programming problem. Usually, 
: a linear membership function is employed to avoid 
nonlinearity. Also, if membership function is interpreted as the 
fuzzy utility of the decision maker, which describes the 
behavior of indifference, preference or aversion towards 
uncertainty, a non linear membership function is a better 
representation than a linear function. 
In this paper, we use a logistic function [12], i.e. a non linear S- 
shape membership function to express vague aspiration levels 
of the decision maker. The S-shape membership function is 
given by 
1 
(= 
F(a) 1+exp(—ax) 
where @, Ü« Q < оо is а fuzzy parameter which measures 
the degree of vagueness. The reason why we use this function 
is that, it is easily bandled. Also, the logistic membership 
function preserves linearity even when the operator “product” 
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is used instead of the operator "min" to aggregate tbe overall | 


satisfaction to arrive at the fuzzy set decision. 
In the MOP model proposed in Section 3.1 and 3.2, the two 
objectives i.c. the reliability and the cost are considered to be 
ambiguous and uncertain. We use the following nonlinear S- 
shape membership functions to express the vague aspiration 
e The membership function of the goal for the 
reliability is given by 


1 
venous 


L 
S AIIRR. | 
lel ah 
where R, is the mid-point (middle aspiration level for the 
reliability) at which the membership function value is 0.5 and 
@, can be given by decision maker based on his own degree 
of satisfaction. 
e The membership function of the goal for the cost is 


given by 
1 
if y, 
1++ехр| а, Y YYa.-C, 
mi jal bel 
where C. is the mid-point (middle aspiration level for the 
cost) at which the membership function value is 0.5 and О. 
can be given by decision maker based on his own degree of 
satisfaction. 
Following Bellman-Zadeh’s Maximization principle [2] and 
using the above defined fuzzy membership functions, the fuzzy 
multi-objective optimization model for selecting the COTS 
software products is formulated as follows: 
Problem P 
max A 


st AS us (x), 


AS uc (x), 
0<4<1, 


and the constraints (3) to (7). 
Fuzzy multi-objective optimization model (P) is saves: for 





А (x)= 





My (x)= 


maximized degree of membership for the fuzzy decision. In 


this approach all the fuzzy objectives are treated equivalently. 
However, approaches have been discussed in literature with 
situations in which the objectives are not equally important [7, 
11]. 


4.0 ILLUSTRATIVE EXAMPLES 

Consider a software system having two modules with more 
than one alternative for each module. The cost reliability data 
set is given in Table-1. Note that the cost of first version i.e. the 
virtual versions for all alternatives is zero and reliability is 
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0.001. This is done for the following reason: If in the optimal 

solution, for some module х. = 1, that implies corresponding 

alternative is not to be attached in the module. 

Let L=3, s ={1,2}, ={I}, зз ={2}, i =05, fp =03 and f, = 

It is also assumed that t, = .01, t, =.05 and t, =.01 
Structure of Software 


FUNCTIONS 





mmc» ZA M+ > 


MODULE 





= 0.60 and a, =16 


By taking @p 


4.1 OPTIMIZATION MODEL I 
The problem is solved using software package LINGO [8]. 
Following solution is obtained. 

411 = 42 = 432 =1 
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X11 = 2020 = 2032 = X42 =1 
It is observed that two or more alternatives are chosen for each 
module. Redundancy is allowed for both the modules. The 
system reliability for the above solution is 0.79 and cost is 30.5 
units and the achievement level of membership function is 
A-2058. 


4.2 OPTIMIZATION MODEL- П 
To illustrate optimization model for compatibility, we use 
previous results. We assume second alternative of second 
module is compatible with second and third alternatives of first 
module. Following solution was obtained using LINGO. 
31172123 = 432 71 
211 735227353 = X43 =1 
It is observed that due to the compatibility condition, second 
alternative of first module is chosen as it is compatible with 
second alternative of second module. The system reliability for 
the above solution is 0.79 and cost is 32 units and the 
achievement level of membership function is 4 = 0.58. 


5.0 CONCLUSION 

In this paper, fuzzy multi-objective optimization model 
approach for selecting the optimal COTS software product 
among alternatives for each module in the development of 
modular software system is discussed. Tbe problem is 
formulated for consensus recovery block fault tolerant scheme. 
In today's ever changing environment, it is arduous to estimate 
the precise cost and reliability of software. For such situation 
wheze the software is developed by assembling COTS software 
products, then it is not possible to get the crisp estimates of cost 
and reliability of these COTS products. Therefore, we have 
drawn on fuzzy methodology for the estimation of reliability 
and cost. This developed approach can effectively deal with the 
vagueness and subjectivity of expert information. 
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ABSTRACT 

` The challenging issue for mechanical industry is to develop fast 
& reliable fault diagnosis systems before total breakdown of 
machine. Fault diagnosis & classification of faults of 
mechanical systems is a difficult task. It improves productivity 
& reduces cost of production. This paper presents an 
approach for classification of commonly observed faults in 
gears of mechanical system. These faults include weared gear, 
gear with one tooth broken & gear with crack on one tooth. 
The Power Spectral Density (PSD) of the vibration signals of 
faulty gears is used to construct feature vectors. Principle 
component analysis (PCA) is used to reduce the dimensions of 
feature vector. The Routine checkup of the machine generates 
Known fault vectors. The ISODATA (Iterative Self Organizing 
Data Analysis Technique) [1] classifies fault vectors along with 
newly collected fault vector. If the fault is different from the 
previously identified fault a new fault cluster is created else 
new fault belongs to one of previously identified fault clusters. 


1.0 INTRODUCTION 

The complexity of engineering systems increases the danger of 
failure of system/machine. This affects productivity, & 
environment. With complex machines the maintenance cost 
increases. Hence fast & precise identification of faults is 
essential. 

Fault can be defined as an abnormal state of a machine or 
system such as malfunction or dysfunctions of pert or an 
assembly ."! 

The critical element in any machine is Gear. The study carried 
out in Germany, on samples of gears shows that 19-24% 
failure of mechanical system is usually because of mishandling 
or inadequate maintenance. This study also shows that damage 
or failure caused by gears & bearings is in the ratio of 3:1.about 
60% failures are because of faults in gear; 19% failures are 
because of faults in bearings & 10% failures are because of 
faults in shafts. К 

The process of fault diagnosis consist of fault detection & 
Papa ficahon ot fandi The шы Ht praes Сап be detecte Dy 
using vibrations generated from it. 

The vibration analyst of a machine requires detailed knowledge 
of a mechanical system, dynamic properties of machine along 
with history of it's maintenance . 

This paper provides tbe approach of identifying the type of 
fault occurred in gear systezn. This provides novel approach of 


using pattern recognition algorithm named as ISODATA for 
classification of faults in gear system. The computing 
efficiency of the classifier is improved by reducing feature 


The present methods of fault classification includes use of 
Learning Machine,. Hoelder Exponents, PCA, ANN, Support 
Vector Machine, Generalized Discriminant Analysis, WT- 
ANN, Case Based Reasoning etc. 


2.0 THE ISODATA ALGORITHM [1] 

ISODATA stands for Iterative Self-Organizing Data Analysis 
Techniques. This is a more sophisticated algorithm which 
allows the number of clusters to be automatically adjusted 
during the iteration by merging similar clusters and splitting 
clusters with large standard deviations. 

We first define the following parameters: 

1. К = number of clusters desired; 

2. I= maximum number of iterations allowed; 

3. P = maximum number of pairs of cluster which can be 


merged; 

4. @y= a threshold value for minimum number of samples in 
each cluster can have (used for discarding clusters); 

5. 0s =a threshold value for standard deviation (used for split 


operation); 

6. Өс= a threshold value for pairwise distances (used for 
merge operation). 

The algorithm: 

Stepl: Arbitrarily choose К (not necessarily equal to K) initial 


i=l, 


Step2: Assign each of ће JV samples to the closest cluster 
center: 


Аеш, i D(A) = mar f Dy (X38), ЫК; 
Step3: Discard clusters with fewer than ON members, i.e., if for 
апу j , Nr< Өх then discard Wj and k. — k-1 


Step4: Update each cluster center: 
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суті 


Y Y 


Step5: Gars de e iso Dra mi fr dide 
Wj from their corresponding cluster center: 


' L2 У лих, tyson hi 
MESES 
Step6: Compute the overall average distance of the samples 
from their respective cluster centers: 
i 
г-у 


toS . 
Step7:If k < k/2 (too few clusters), go to Step 8; else if k>2k 
(too many clusters), go to Step 11; else go to Step 14. 
(Steps 8 through 10 are for split operation, Steps 11 through 13 
are for merge operation.) 


Step8: First step to split. Find the standard deviation vector У j 
= [o16)) ,....., o n(j)]T for each cluster: 






Fu l nia 
= 1 Уо, aken jalak 
Y La Н Ы J 
tiM. 
where , 
пъб) is the іё component of M; and o, is the standard 


deviation of the samples in Wj along the i* coordinate axis. Nj 
is the number of samples in Wj. 


Step9: Find the maximum component of each} j and denote it 
by Ga. (j) ; Do this for all 


pele 


Step10: If for any omax(), (j 1... , all of the 


following are true 


eie Ө; 


VE 


Then split Mj into two new cluster centers Mj(4) and Mj(-) by 
adding +5 to the component of Mj corresponding to omax(j) , 
where “can be a omax(j) , for some a >0. Then delete Mj and 
let k. k -1. Go to Step 2 else Go to Step 14. 
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Step11: First step to merge. Compute the pairwise distances Dij 
between every two cluster centers: 


D, z DM M, 


йзге thoes kiki VDA nes in ascending arde 
Step12: Find no more than Z smallest Di j 's which are also 
smaller than ӨС and keep them in ascending arder: 


ig aot ¥ 
Fora 12 
QU Vu a 


Step13: Perform pairwise merge: for 
following: 

If neither of M; and M, 1 has been used in this iteration, 
Then merge them to form a new center: 


1 
Me [ХМ RM 


MEA 
Delete M, and M ,, and let X.4- —— —k -1. Go to Step 2. 
Stepl4: Terminate if maximum number of iterations I is 
reached. Otherwise go to Step 2. 

The ISODATA algorithm is more flexible than the K- 
mean method. But the user has to choose empirically many 
more parameters listed previously. 


3.0 EXPERIMENTAL SET UP 

It consists of an half HP induction motor mounted on rigid steel 
structure. Driven gear is mounted on motor shaft. The load is 
coupled to driver gear by driven gear. The machine runs at 
constant speed of 1470 RPM at constant load (80 96 of rated 
capacity). Both gears are identical having 62 teeth. Different 
Fault conditions were created on driven gear typically weared 
gear, cracked tooth, broken tooth. Figure 1 shows photograph 
of the model of experimental set up kept on rubber pad. These 
pads are used to reduce weak foundation fault effects on feature 
vector sets. 





Figure 1: Model of experimental set up 


Using this set up the faulty vibration signatures are collected by 
accelerometer & stored in memory of computer. 
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3.1 ARTIFICIALLY CREATION OF FAULTS ON GEAR 
TOOTH.” 

The common faults observed in Spur gear are: 

Weared gear :This fault was created by filing the gear 
teeth in both direction of rotation to remove the 
material from teeth up to 500 micron. 


3.1.1 


Figure 2: weared gear 


3.1.2 One Tooth Broken Or Missed: To get signal of this 
condition ,the gear tooth was removed by hack-saw 
blade. 





Figure 3: Gear with onc tooth broken or missed 
3.13 — Crack On One Tooth: The signal of this condition is 


obtained by cutting the tooth with hack-saw blade at 
the root of the tooth in the direction of rotation. 





Figure 4: gear with crack on one tooth 


3.2 CONSTRUCTION OF ACCELEROMETER . 
"The accelerometer used in this set up uses ring type crystal as 
a sensor which has a mass attached to one of its surfaces. 
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When the mass is subjected to a vibration signal, the mass 
converts the vibration (acceleration) to a force, this then being 
converted to an electrical signal representative of the incoming 
vibration signal as shown in following figure. This is the basis 


Pre-boading Spring 


_ Mass 


_orystal Element 






Mourtine 5009 


Figure 5: Accelerometer 


3.3 PROCEDURE FOR RECORDING SIGNAL 

3.3.1 Тһе motor is run at the rated speed of 1470 rpm. Load 
is applied by providing кипе tetto tU bears 
pulley system. 

The accelerometer is mounted near the driven gear & 
it’s output is connected to microphone input of sound 
card of computer. 

First the readings for non-defective ,good lubricated 
gear condition is recorded using ‘Gold wave’ software 
for a period of ane minute. It is stored for further 


analysis and comparison with other signals derived 


from faulty gears mentioned above. 

The non-defective gear is then removed using gear 
puller and replaced by faulty gears. For each faulty 
gear signal derived from actelerometer is recorded & 
stored in memory of computer in wave file format for 
further analysis. 

For each reading load & speed Conditions are kept 
constant. s 

The accelerometer signal is: sampled at the sampling 
rate of 44100 samples per seconds. Each sample is of 
16 bit, MSB reserved for sign. Gear mesh frequency 
of machine under observation is 24.5 RPS x 62 teeth 
= 1519 Hz. The second and third harmonics shoes 
significant amplitude and sidebands along the gear 
mesh frequency harmonic. Hence sampling rate of 
44100 samples per second proved sufficient 
throughout experimentation. 


3.3.4 


3.3.5 


3.3.6 
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4.0 RESULTS 
Figure 6 shows the vibration signal for each type of gear. 
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By observing above signatures/signals its ee 
type of fault. 


4.1 KURTOSIS 

The kurtosis can be used to check the distribution of signal. 
Kurtosis is a measure of how outlier-prone a distribution is. 
The kurtosis of the normal distribution is 3. Distributions that 
are more outlier-prone than the normal distribution have 
kurtosis greater than 3; distributions that are less outlier-prone 
have kurtosis less than 3. 
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The kurtosis of a distribution is defined as 

k-E(x414 /o4 

where, 

р is the mean of x, 

c is the standard deviation of x, 

E(t) represents the expected value of the quantity 

Tho kartoia of good оса geer is low indicating normal 
signal distribution while other signals shows higher kurtosis 
i aE e el 





The observations in table show different kurtosis value for 
each type of gear signal; but from this value we cannot predict 
type of fault in the gear. Hence some type of intelligent system 
should be used to identify the fault. This paper used 
ISODATA to identify the fault. 


42 FEATURE VECTOR GENERATION AND PCA 

The feature vectors are generated by determining 256 point 
Power Spectral Density (PSD) of fault signal . This produces 
set of dimensional feature vectors from each type of class/fauit 
signal. Figure 7 shows Power spectral density of each type of 
fault: : 
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Figure 7: PSD Of All Types Of Signals Of Gear Vibration. 


The dimensionality of this feature vector is large & may result 
in to large classification or training period. Hence it is needed 
to reduce the  dimensionality of input vector .For this 
Principle Component Analysis (РСА) ™ is used. 

PCA removes redundant information. PCA has three effects: 
4.2.1 It orthogonalise the components of input vectors ; 80 
that they are uncorrelated with each other. 

4.22 


423 Tt eliminates those components which contribute leat 


to the variations in the data set. 
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4.3 CLASSIFICATION OF FAULTS (USING ISODATA) 


43.1 Classified fault database 
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Figure 10: Classiied Cracked tooth Gear Fault 
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Figure 11: Classiied Broken or missed tooth Gear Fault 
Following table shows the success of ISODATA Algorithm in 


classifying various signatures of gear fault. 





Table2: Success Of ISODATA 


5.0 CONCLUSION 

The ISODATA algorithm classifies all types of faults. On 
some occasions it fails to distinguish between the faults. This 
happens if the fault feature vectors are in close vicinity. e.g. it 
fails to distinguish between the signal of broken tooth & 
cracked tooth as depicted in fig 11 above. This is the limitation 
of this algorithm & it could be overcame by searching better 
vibration signal processing method that keep feature vectors 


"apart . 
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Editorial 


It is a matter of both bonor and pleasure for us to put forth the sixth issue of ВІЛТ; the BVICAM's 
International Journal of Information Technology. This issue of the journal presents a compilation of ten 
papers that span a broad variety of research topics in various emerging arees of Information Technology 
and Computer Science. Some application oriented papers, having novelty in application, have also been 
inctuded in this issue, hoping that usage of these would enrich the knowledge base and facilitate the overall 
economic growth. This issue shows our commitment in realizing our vision “ю achieve a standard 
comparable to the best in the field and finally become a symbol of quality”. 


As а matier of policy of the Journal, all the manuscripts received and considered for the Journal by the 
editorial board are double blind peer reviewed independently by at-least two referees. Our раве! of expert 
referees posses a sound academic background and have a rich publication record in various prestigious 
journals representing Universities, Rescarch Laboratories sad other institutions of repute, which, we intend 
to further augment from time to time. Finalizing the constitution of the panel of referees, for double blind 
peer review(s) of the considered manuscripts, was a painstaking process, but it helped us to ensure that the 
best of the considered manuscripts are showcased and that too after undergoing multiple cycles of review, 
as required. 


The ten papers that wore finally published were chosen out of more than cighty papers that we received 
from all over the world for this issue. We understand that the confirmation of final acceptance, to the 
authors / contributors, is delayed, but we also hope that you concur with us in the fact that quality review is 
a time taking process and is further delayed if the reviewers are senior researchers in their respective fields 
and hence, are hard pressed for time. — 


We wish to express our sincere gratitude to our panel of experts in steering the considered manuscripts 
through multiple cycles of review and bringing out the best from the contributing authors. We thank our 
esteemed authors for having shown confidence in BIJIT and considering it a platform to showcase and 
share their original research work. We would also wish to thank the authors whose papers were not 
published in this issoe of the Journal, probably because of the minor shortcomings. However, we would 
like to encourages them to actively contribute for the forthcoming issues. 


The undertaken Quality Assurance Process involved a series of well defined activities that, we hope, went a 
long way in ensuring the quality of the publication. Still, there is always a scope for improvement, and so 
we request the contributors and readers to kindly mail us their criticism, suggestions and feedback at 
bitit@ bvicam.scin and help us in further enhancing the quality of forthcoming issues. 
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| A Framework for Hierarchical Clustering Based Indexing in Search Engines 


ABSTRACT 
Granting efficient and fast accesses to the index is а key issue 
for performances of Web Search Engines. In order to enhance 
memory utilization and favor fast query resolution, WSEs use 
Inverted File (IF) indexes that consist of an array of the 
posting lists where each posting list is associated with a term 
and contains the term as well as the identifiers of the documents 
containing the term. Since the document identifiers are stored in 
sorted order, they can be stored as the difference between the 
successive documents so as to reduce the size of the index. This 
paper describes a clustering algorithm that aims at 
partitioning the set of documents into ordered clusters so that 
the documents within the same cluster are similar and are being 
assigned the closer document identifiers. Thus the average 
value of the differences between the successive documents will 
be minimized and hence storage space would be saved. The 
paper further presents the extension of this clustering algorithm 
to be applied for the hierarchical clustering in which similar 
clusters are clubbed to form a mega cluster and similar mega 
yclusters are then combined to form super cluster. Thus the 
. paper describes the different levels of clustering which 
optimizes the search process by directing the search 
to a specific path from higher levels of clustering to the lower 
levels Le. from super clusters to mega clusters, then to clusters 
and finally to the individual documents so that the user gets the 
best possible matching results in minimum possible time. 
Keywords: 
Identifiers Assignment, Hierarchical Clustering 
1 INTRODUCTION 
The indexing phase [1] of search engine can be viewed as a 
Web Content Mining process. Starting from a collection of 
unstructured documents, the indexer extracts a large amount of 
information like the list of documents, which contain a given 
term. It also keeps account of number of all the occurrences of 
each tezm within every document. This information is 
[reel Trem cy мод en i 
file (IF). IF is the most widely adopted format for this 
due to its relatively small size occupancy and the 
involved in resolution of the keywords based queries. 
index consists of an array of the posting lists where each 
ing list is associated with a term and contains the term as 
(Well as the identifiers of the documents containing the term. 
Since the document identifiers are stored in sorted order, they 
can be stored as the difference between the successive 
documents so as to reduce the size of the index. Storing the 
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differences require coding of small integer values [7] which can 


be encoded with a small number of bits and also aids in: 
compressing the index. So if the similar documents [1] are' 
assigned the closer document identifiers, then in the posting 
lists, the average value of the difference between the successive, 
documents will be minimized and hence storage space would be 
saved. For example, consider the posting list ((job;5) 1, 4, 14, 
20, 27) indicating that the term job appears in five documents: 
having the document identifiers 1,4,14,20,27 respectively. The: 
above posting list can be written as ((job; 5) 1, 3, 10, 6, 7): 
where the items of the list represent the difference between ће: 
successive document identifiers. The figure 1 shows the! 
example entries in the index file. ' 


Tem No. of docs не < Doc ids stored ' 
cae sey direc cog i 
0 N ым. | 


шы ы о CREME 
3,6,9,12... 3333... 













eei a s ERE 

Clustering is a widely adopted technique aimed at dividing a. 
collection of data into disjoint groups of homogenous elements. 
Document clustezing [3] has been widely investigated as a 


documents relating to a certain topic will hopefully be placed in 
a single cluster. So if the documents are clustered, comparisons | 
of the documents against the user's query are only needed with’ 
certain clusters and not with the whole collection of documents. : 
The fast information retrieval can be further achieved by: 
hierarchical clustering in which the similar clusters are merged: 
together to form higher levels of clustering: In this paper, the 
proposed heuristic exploits a text clustering algorithm that | 
reorder the collection of documents on the basis of document. 
similarity. The reordering is then used to assign close document 
identifiers to similar documents thus reducing differences 
between the document identifiers and enhancing the 
compressibility of the IF index representing the collection. The 
proposed clustering algorithm sims at partitioning the set of. 
documents into k ordered clusters on the basis of similarity 
measure 80 that the documents on the web are assigned the, 


identifiers in such a way that the similar documents are being 
assigned the closer document identifiers. Further the extension 
of this clustering algorithm has been presented to be applied for 
hierarchical clustering [5] in which similar clusters are clubbed 
to form a mega cluster and similar mega clusters are then 
combined to form super cluster. Thus the different levels of 
clustering have been defined which aids in better indexing. As a 
result of clustering, the size of the index gets compressed and 
moreover, it also optimizes the search process by directing the 
search to a specific path from higher levels of clustering to the 
lower levels i.e. from super clusters to mega clusters, then to 
clusters and finally to the individnal documents so that the user 
gets the best possible matching results in minimum possible 
time. 
2 RELATED WORK 
In this paper, a review of previous work on document clustering 
algorithms is given. In this field of clustering, many algorithms 
have already been proposed but they seem to be less efficient in 
re a eee 
of clustering less effective. K-means algorithm [4, 6] has 
been proposed in this direction, which imitially chooses k 
ts as cluster representatives and then assigns the 
ing nk documents to one of these clusters on the basis of 
similarity between the documents. New centroids for the k 
clusters are recomputed and documents are reassigned 
according to their similarity with the k new centroids. This 


process repeats until the position of the centroids become stable. 3 
Computing new centroids is expensive for large values of n and . 


the number of iterations required to converge may be large. 
Another work proposed was the reardering algorithm [1] which 


partitions the set of documents into k ordered clusters on the — 


basis of similarity measure. According to this algorithm, the 
biggest document is selected as centroid of the first cluster and 
n/kl most similar documents are assigned to this cluster. Then 
the biggest document is selected and the same process repeats. 
The process keeps on repeating until all the k clusters are 
formed and each cluster gets completed with n/k documents. 
This algorithm is not effective in clustering the most similar 
documents. The biggest document may not have similarity with 
E 
cluster 


: work was the threshold based clustering 
[iater перона volt wei du ыыы шы choi 
However, two documents are classified to the same cluster if the 
similarity between them is below a specified threshold. This 
threshold is defined by the user before the algorithm starts. It is 
easy to see that if the threshold is small, all the elements will get 
assigned to different clusters. If the threshold is large, the 
elements may get assigned to just one cluster. Thus the 
algorithm is sensitive to specification of threshold. 

Fozzy Co-clustering of Web Documents is a technique to 
simultaneously cluster data (or objects) and features. In case of 
web, web documents are the data, and the words inside the 
documents are the features. By performing simultaneous 
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clustering of documents and words, meaningful clusters of 
highly coherent documents can be generated relative to the 
highly relevant words, as opposed to clusters of documents with 
respect to all the words as in the case of standard clustering 
algorithm. PCCM algorithm proposed in this direction is aimed 
at clustering data in which attributes can be categorical , 
(nominal) and the distance or similarity between two patterns is 
not explicitly available. FCCM accomplishes this task by 
maximizing the degree of 'aggregation' among the clusters. The 
major drawback of FCCM is that it poses problems when the 
no. of documents or words is large. Moreover this algorithm is 
less effective when data has large number of overlapping 
clusters. 


In this paper, the proposed algorithm has tried to remove the 
shortcomings of the existing algorithms. It produces a better 
ordering of the documents in the cluster. This algorithm picks 
the first document as cluster representative, then selects the 
most similar document to it and puts it in the cluster, it further 
selects document which is most similar to the currently selected 
document and repeats until the first cluster becomes full with 
wk documents. The same process is then repeated to form the 
rest of the clusters. Thus the most similar documents are 
accumulated in the same cluster and are assigned consecutive 
document identifiers. Thus the algorithm is more efficient in 
compression of the index. 


PROPOSED ALGORITHM FOR CLUSTERING 
BASED INDEXING 


Let D ={D1, D2. . „ DN} be a collection of N textual 
documents to which consecutive integer document identifiers 1, 
. . ,N are initially assigned. Moreover, let T be the number of 
distinct terms ti, i = 1, . .., T present in the documents, and t the 
average length of terms. The total size CSize (D) [27] of an IF 
index for D can be written as: 


Csize (D) =CSiz@encon (T. ыы Pa C) 
izlto T 


vero ИНИ; pit) is the number of bytes needed to code 
the lexicon, while d_gaps (ti) is the d_gap [28] representation of 
the posting list associated to term ti, and Encodem is a function 
that returns the number of bytes required to code a list of d gaps 
according to a given encoding method m. 

The compression of index is achieved by applying clustering to 
the web pages so that the similar web pages are in the same 
cluster and hence assigned closer identifiers. A clustering 
algorithm has been proposed, which converts the individual 
documents into k ordered clusters, and hence documents are 
reassigned new document identifiers so that the documents in 
the same cluster get the consecutive document identifiers. The 
clustering of the documents is done on the basis of similarity 
between the documents, which is first of all calculated using 
some similarity measure. The proposed architecture for the 
clustering based indexing system is given in figure 2. 
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Figure 2: Architecture of Clustering based indexing System in 
Search Engines 


A. COMPUTING THE SIMILARITY MATRIX 
Let D={D1, D2....... Dn) be the collection of N textual 
documents being crawled to which conseetítive integers 
document identifiers 1...n are assigned. Each document Di can 
be represented by a corresponding set Si such that Si is a set of 
all the terms contained in Di. Let us denote that set by D* such 
that D*-(S1,S2,........... Sn). The similarity of any two 
documents Si and Sj can be computed using the similarity 
measure [1]: 
Similarity measure (Si, Sj) = Si A SjI/ISIUSJI 
INPUT - The set D*= (S1, S2, S3, S4...Sn] where Si is a set of 
all the tezms of document Di. 

—The number k of clusters to create. 
OUTPUT - k ordered clusters representing a reordering of D 
The algorithm that calculates the similarity of each document 
with every other document using the similarity measure given 
above is given in figure 3. 





Figure 3: Algorithm for computing similarity matrix 


: The above algorithm constructs the document similarity matrix 
[15]. The number of calculations performed leads to formation 
of the upper triangular matrix. The rest of the values in the 
similarity matrix are assigned 'automatically as we know 
similarity, measure (i, j}= similarity rpeasure (j, i). 


B. THE ALGORITHM 


The clustering algorithm which clusters together the similar 
documents is given below: 
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И for number of clusters 


begin 
cf-ü 
for сж] to n/k 


/fixinally cluster is empty with no document m it 
/ fox number of documents in one cluster 


for j=l ton 

See ад NOU ADEM 
chacf U Si 
D*D*- Si 
for = 1 ton 
begin 


sim[I][1e0 
ашо 


Figure 4: Algorithm for Clustering(docum_clustering) 


It may be noted that the algorithm starts with the first cluster 
which is empty initially. The first document from the collection 
is considered and put in the first cluster. Now, using the 
similarity matrix, the most similar document to it is considered. 
All the entries of the row and column associated with the first 
document are made zero as this document cannot be added to 
any other cluster. The most similar document picked is put in 
the same cluster. Now the second document that was considered 
takes the role of the first document and the most similar 
document to it is considered and this procedure repeats for n/k 
times when the first cluster gets full. Thus at the end, we get k 
clusters each with n/k number of similar documents. 


C. EXAMPLE ILLUSTRATING CLUSTERS 
FORMATION 

Having discussed the algorithm, let us now have panoramic 
view as to how the clustering of the documents takes place. For 
e.g. if we have 10 documents — A, В, CD, EF G,H, LJ & 
value of k is 2 i.e. 2 clusters are to be made, then according to 
the algorithm, the similarity among the documents is computed 
using the similarity measure and hence the formed upper 
triangular similarity matrix will be: 
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Figure 5: Initial Similarity Matrix 

Now from the computed values in the upper triangular matrix, 
the matrix can be completed as follows using the property that 
similarity measure(i,j) similarity measure(j). The full 
similarity matrix is given in the figure 6. According to the 


clustering algorithm, = 4 


a 
ы 1 cluster will have A, then E, then F, then Н & lastly С | 
~ 2 cluster will have J, then D, then G, then I & lastly B 











[Bis [о 15 [4 |6 [2/3 |5.|7 | 8. 
Ге [з 15 |0 [5 [2131619 4 [7 
р 6 а 15 о |21316 [5 [4191 
Е |9 1612 12 |01815 [3 [6151 
LF [8 2 |з [з |8 |0|8 [9 [512 
[G[2 [3 [6 [615 |в [о [6 |514 
нз |5 [915 |з 96 |0 13161 
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Figure 6: Full Similarity Matrix 















The output after calculating similarity for first five documents 
will be : 





1]0]7]0]4]0]015]0)0]5] 
1 |[0]8]0]9]0]0]4]0]5]0] 


Figure 7: Matrix after formation of first cluster 
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4. PROPOSED HIERARCHICAL CLUSTERING 
ALGORITHM 

The lack of а central structure and freedom from a strict syntax - 
is responsible for making a vast amount of information 
available on the web, but retrieving this information is not easy. 
One possible solution is to create a static hierarchical 
categorization of the entire web and using these categories to 
organize the web pages. Organizing Web pages into a hierarchy 
of topics and subtopics facilitates browsing the collection and 
locating results of interest. In hierarchical clustering algorithm, 
after the cluster of similar documents have been formed, the 
similar clusters are merged together to form the mega clusters 
using the same similarity measure as is used to cluster togetber 
the documents. The framework for the hierarchical clustering is 
shown in figure 8. 





Figure 8: Hierarchical Clustering 


À. COMPUTATION OF SIMILARITY MATRIX OF 
CLUSTERS f 

similar clusters is given below. In this algorithm, 

D=(S$1,S2....Sk} where Si is a set of terms in the cluster ci. 


Algorithm chaster_similarity 
fori= 1 tok 


begm 
sim ПВ] = 0 
forjcirltok 
begin 


sim [Jj] > timilarity measare (Si, Sj) < 
аш] = аш] ' ` 
end 





Figure 9: Algorithm for similarity matrix of clusters 


В. ALGORITHM FOR HIERARCHICAL CLUSTERING - 
The hierarchical clustering [9] algorithm that aims at forming ' 
the mega clusters out of the similar clusters is given in figure 
10. 

In this algorithm, the first mega cluster is considered which is 
initially empty. The first cluster. from the collection is 
considered and put in the first mega cluster. Now, using the 
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similarity matrix, the most similar cluster to it is considered. All 
the entries of the row and column associated with the first 
cluster are made zero as this cluster cannot be added to any 
other mega cluster. The most similar cluster picked is put in the 
same mega cluster. Now the second cluster that was considered 
takes the role of the first cluster and the most similar cluster to it 
is considered and this procedure repeats for k/m times when the 
first mega cluster gets full. Now the second mega cluster is 
considered and the same procedure repeats until all the mega 
clusters get full. Thus at the end, we get m mega clusters each 
with k/m number of clusters such that the clusters within the 
same mega cluster are similar. 


1 








kel lods 
for f=1 to m 
begin 
cf=0 
fore = 1 to km 
begin 
forj = 1 tok 
select max from sim Ш 
сас 0а 
De D-si 
for bs] tok 


begin 
xim MG] =0 
aim [i][1] = 0 


























Figure 10: Algorithm for mega clustering 
C. EXAMPLE ILLUSTRATING HIERARCHICAL 


CLUSTERING 
For e.g. user has to fire a query “Jobs for Computer Engineers 


having 5 to 10 years experience". Now since the hierarchy of ;; 


clusters has been formed, so the search will proceed im the 
manner as shown in the figure 12. The search will start from the 
super cluster “Job”, will be directed to the mega cluster 
“COMP. ENGG.”, then will reach the cluster “S TO 10 YRS 
EXPERIENCE” and finally will reach the individual relevant 


documents. Thus the search follows a specific path from super : 


cluster to the individual document as shown in figure 11. 


Figure 11: Search Path 
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5. IMPLEMENTATION OF PROPOSED WORK 


For indexing the documents, firstly we have to parse the 
documents. After that similarity matrix is created and then k 
means algorithm is applied for creating the clusters. Clusters 
will be created at first level For creating clusters at second level 
same procedure is applied again and then finally hierarchical 
clustering is done for indexing. 





Figure 11: Work Flow of Implementation 


A. SNAPSHOTS OF IMPLEMENTED WORK 


1. Given input & parsed data, the following snapshot represents 
the parsed data which is the initial step for indexing the data. 


ч. 
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2. Clustered data 

The data is now clustered according to the similarity of 
words.The following figure shows the clusters created that are 
created after matching the similarity of documents with each 
other. 
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At second level, more no of clusters are created in comparison 
to first level. As the similarity between two documents is more. 


6. CONCLUSIONS Е 
In this paper, an efficient algorithm for computing a 
of a collection of textual documents has been Ai is 
effectively enhances the compressibility of the IF’ "iBüex built 
over the reordered collection. Further, the proposed rahic 
clustering algorithm aims at optimizing the search“ process by 
forming different levels of hierarchy. The proposed golf i 
superior to the other algorithms as a summarizing | add Urowking 
Е tool. A critical look at the literature indicates that ih cóttrast to 
the earlier proposed algorithms, the proposed work pròduces а 
better ordering of the following advantages: 
1. Compression of Index Size: The size index of the index is 
compressed as similar documents are assigned closer tk 
identifiers. i 
2. Reduction in Search Time: The search time gets redüced as 
the search gets directed to a specific path from super ‘bluster to 
mega clusters, then to clusters and finally to the individual 
documents. 
3. Fast retrieval of relevant documents: Since thé similar 
documents get clustered together in the same Cluster,’ the 


À- Graph Representing the Created Clusters at First Level specific query rel t docaments be riged from 
At first level, less no of clusters are created. As at first level the that cluster, xd пе тары 
similarity between two documents is less. -— а 
Cogo I 
REFERENCES Сои. 
[1]. Fabrizio Silvestri, Raffaele Perego and Salvatore Orlando. 
“Assigning Document Identifiers анна 


Compressibility of Web Search Engines Indexes" In 














aum ber of stusters 


‚ proceedings of SAC, 2004. т do 
[2]. Van Rijsbergen CJ. “Information Retrial”! “f 14 
° Butterworth 1979 





OL 


[3]. Oren Zamir and Oren Etzioni. “Web Document Ch 
A feasibility demonstration” In the proceedings gË SIGIR, 
1998. 

[4]. Jain and R. Dubes. “Algorithms for Clustering Data,” 

Prentice Hall, 1988 
Figure 14: Clusters created at first level — [5]. Sanjiv К. Bhatia. "Adaptive KMeans Clustering" Ансап 
Association for Artificial Intelligence, 2004. р i 








B-Graph Representing the Created Clusters at Second Level 





Clustering in Information Retrieval" IEEE ipu 
Systems, Man and 


vu! yat 





Aer 


[8]. Chris Staff: Bookmark Category Web Page Classification 
aia I ЕЕ ,AH 
2008:345-348 iun 

[9]. Khaled M. Hammouda, Mohamed S. Kamel: `} cient. 

Phrase-Based Document Indexing for Web' Document 
Clustering. IEEE Trans. Knowl. Data Eng. UM 


| | | 


16(10):1279-1296 (2004). 
Figure 15: Clusters created at second level 








Copy Right © BUIT - 2011 VoL 3 No. 2 ISSN 0973 - 5658 6 


BVICAM’s International Journal of Information Technology (ВІЛТ) 


Bharati Vidyapecth’s Institute of Computer Applications and Management (BVICAM), New Delhi 


Web 3.0 in Education & Research 
MP У Rajiv! and Manohar La 
| | Submitted in May 2011; Acc in July 2011 
"ABSTRACT In respect of different versions of web, the Wikipedia states: 
The continuous evolution of the Internet has opened “Web 1.0 is Read Only, static data with simple markup for 


К ГЫ 


җиштакїлаМе opportunities and challenges in web based 

n and learning. The traditional version of web i.e. Web 
‘2.0 ‘started as a Read only medium; the next version Web 2.0 
оет ee 
olving version of web, viz, Web 3.0 is said to be a 

gically advanced medium which allows the users to 

осо 
зоте of the thinking so far expected only from the human 
VU IEEE T Cp 

and technologies for facilitating web based education 
& leaming. To begin with, this paper discusses some 
definitions of the Web 3:0, its evolution and characteristics. 
Next, we, have discussed about the possible future Web 3.0 
technologiés, trends, tools and services that will assist in the 
areas of online learning, personalization and knowledge 


ne 


construction powered by the Semantic Web. 


KEYWORDS 
Web 3.0, Semantic Web, Educational Technology, Online 
Learning, 3D learning environments, e-learning. 


E M ide 


T. INTRODUCTION 

Far ‘about last two decades, the World Wide Web(WWW) is 

being used to improve communication, collaboration, ‘sharing 

of resources, promoting active learning, and delivering of 

education in distance learning mode. The WWW helps teachers 
in plá 1g suitable online delivery structure, sharing goals of 

Каш дшш fot Mar ie, 





in"rélent years, many of the universities and educational 
institutions world wide offer online services such as for 
adihissioris, virtual (online) learning environments in order to 
facilitate the lifelong learning and to make this compatible with 
Other Cüücational management activities. For example, a 
сы: may create a purely Web-based delivery system 


lésrners may access web based material anytime from 
any where in the world, being connected through Internet. 
Size thé 1990s when the World Wide Web was established, it 
s evalved from the earlier versions, viz. Web LO to Web 2.0, 
atid finally is evolving into the newest version, viz., Web 3.0. 


reading. Web 2.0 is Read/Write dynamic data through web 
services customize websites and manage items. Web 3.0 is 
Read/Write/Execute." In Web2.0, user not only reads 
information from the*internet, but also provides information 
through internet to share with others. Currently we have many 
popular Web 2.0 interactive applications like Blog, Podcast, 
Mashup, Tag, RSS/Atom, Wiki, P2P, Moblog, Adsense and so 
on. Compared with Web 2.0, there is not a very clear definition 
available for Web 3.0 till now. Web 3.0, to be discussed in 
detail below, is a term used to describe the future of the World 
Wide Web. Views of different pioneers on the evolution of 
Web 3.0 vary greatly. Some believe that emerging technologies 
such as the Semantic Web will transform the way the Web is 
used, and lead to new possibilities in artificial intelligence, 
based applications. Other visionaries suggest that increase in 
Internet connection speeds, modular web applications, or 
advances in computer graphics will play the key role in the 
evolution of the new version of World Wide Web [1]. 


2. DEFINITIONS OF WEB 3.0 

The term ‘Web 3.0’ was first coined by John Markoff of the 
New York Times in 2006 [6], and first appeared significantly 
in early 2006 in a Blog article “Critical of Web 2.0 and 
associated technologies such as Ajax” written by Jeffrey 
Zeldman. Major IT experts and researchers support different 
approaches to the future Web. There is complete agreement 
among the experts about how Web 3.0 will evolve. Below we 
discuss the opinidhs of pioneers in the field in this respect. 

Tim Bemers-Lee, coined the term Semantic Web, and promotes 
the concep? of conversion of Web into a big collection of 
databases [2]. 

About Web 3.0, Tim Berner Lee [3] says: 

"People keep asking what Web 3.0 is. 1 think maybe when 
you've got an overlay of scalable vector graphics - everything 
rippling and folding and looking misty-on Web 2.0 and access 
to a semantic Web integrated across a huge space of data, 
you'll have access to an unbelievable data resource." 

Netflix founder, Reed Hastings [4] thinks that Web 3.0 would 
be a full video Web as stated below: 


"Web 1.0 was dial-up, 50K average bandwidth; Web 2.0 is an 
average 1 megabit of bandwidth and Web 3.0 will be 10 
megabits of bandwidth all the time, which will be the full video 
Web, and that will feel like Web 3.0" 


thd 
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Yahoo founder, Jerry Yang thinks that the new era of tools & 
techniques for creating programs, data, content and online 
applications will blur the distinction between professional, 
semi-professional and consumers. At the TechNet Summit in 
November 2006, Yang stated [4]: 


“Web 2.0 is well documented and talked about. The power of 
the Net reached a critical mass, with capabilities that can be 
done on a network level. We are also seeing richer devices over 
last four years and richer ways of interacting with the network, 
not only in hardware like game consoles and mobile devices, 
but also in the software layer. You don't have to be a computer 
scientist to create a program. We are seeing that manifest in 
Web 2.0 and Web 3.0 will be a great extension of that, a true 
communal medium. ..the distinction between professional, semi- 
professional and consumers will get blurred, creating a 
network effect of business and applications. " Finally, we 
consider what Google's CEO, Eric Schmidt [5] stated: 

“Web3.0 as a series of combined applications. The core 
software technology of Web3.0 is artificial intelligence, which 
can intelligently learn and understand semantics. Therefore, 
the application of Web3.0 technology enables the Internet to be 
more personalized, accurate and intelligent." 

These are some of views about Web 3.0 of the different experts 
of IT industry. Next, we discuss some of characteristics of Web 
3.0. 


3. CHARACTERSTICS OF WEB 3.0 
Four characteristics of Web 3.0, as given below, can be 
summarized from the above definitions and descriptions. 


Intelligence: 

Experts believe that one of the most promising features of Web 
3.0 will be Web with intelligence, ie. an intelligent web. 
Applications will work intelligently with the use of Human- 
Computer interaction and intelligence. Different Artificial 
Intelligence (AI) based tools & techniques (such as, rough sets, 
fuzzy sets, neural networks, machine learning etc) will be 
incorporated with the applications to work intelligently. This 
means, an application based on Web 3.0 can directly do 
intelligent analysis, and then optimal output would be possible, 
even without much intervention of the user. Documents in 
different languages can be intelligently translated into other 
languages in Web3.0 era. Web 3.0 should enable us to work 
through natural language. Therefore, users can use their native 
language for communication with the others around the world 
[6]. 


Personalization: 

Another characteristic of Web 3.0 era is Personalisation. 
Personal or individual preferences would be considered during 
different activities such as information processing, search, 
formation of personalized portal on the web. Semantic Web 
would be the core technology for Personalisation m Web 3.0 


[7] [8]. 
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Interoperability: 

In the context of Web 3.0, the terms Interoperability, 
collaboration and reusability are basically interrelated. 
Interoperability implies reuse, which is again a form of 
collaboration. Web 3.0 will provide a communicative medium 
for knowledge and information exchange. When a person or a 
software program produces information on the Web and this 
information is used by another, then the creation of new form 
of information or knowledge takes place [24]. Web 3.0 
applications would be easy to customize & they can 
independently work on different kinds of devices. An 
application based on Web 3.0 would be able to run on many 
types of Computers, Microwave devices, Hand-held devices, 
Mobiles, TVs, Automobiles and many others. Pervasive Web is 
the term used to describe this phenomenon where web is 
operable to a wide range of electronic devices. 


Virtualization: 

Web 3.0 would be a web with high speed internet bandwidths 
and High end 3D Graphics, which can better be utilised for 
virtualisation. The trend for future web refers to the creation of 
virtual 3-Dimensional environments. An example of the most 
popular 3-D web application of Web 3.0 is Second Life [7]. 


4. TECHNOLOGY TRENDS FOR WEB 3.0 

Based upon the above definitions, it is likely that the new 
generation of web applications will have some specific core 
technologies to support them. In this section, we present some 
of the major trends in terms of technologies that might become 
the building blocks of the next generation of the Web. Figure 1 
depicts the evolution of the web in terms of the core 
technologies, the content and services available to end users. 
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Figure 1: Evolution of the Web 


Semantic Web: 

The extension of the World Wide Web that provides an 
efficient & easier way to share, find and combine data & 
information from distinct sources is called Semantic Web. In 
the simplest terms, we can define Semantic Web as a 
relationship between things, described in a manner which 
makes people and machines able to understand. We may say, ' 


Web 3.0 in Education & Research 


` Traditional World Wide Web = Web of Documents with 


Semantic Web = Web of Integrated, Linked meaningful Data. 


` Semantic Web is all about data integration. The Semantic Web 
converts “display only” data to meaningful information by 
using metadata’ Ontologies, which contain the vocabulary, 
semantic relationships, and simple rules of inference and logic 
for a specifie domain, are accessed by software agents. These 
agents locate and combine data from many sources to deliver 
relevant information to the user [23]. 
One of the objectives of Semantic Web is to identify and 
provide the exact required data that matches the keywords 
provided by the user. For example, if we search keyword 
datamining through Google, yahoo or any of search engines, 
millions of web pages appear as search results out of which a 
few may have some relevant information and all other pages 
may be useless. Web 3.0 in terms of Semantic Web is the third 
generation of World Wide Web in which machines will have 
the ability to read Web contents like Human beings and also 
the ability to follow our directions. For example, if you order to 
check the schedules of all the show timings of a film in 
theaters, for your preferred timings, within a 20 km radius, then 
it follows and provides the appropriate information in respect 
of your preferences. 


* 


The 3D Web: 

“This trend of the future World Wide Web refers to the 
formation of virtual 3-dimensional worlds on the Web. The use 
of 3D graphics will be extensively utilized in the development 
of Web 3.0 tools or applications. High speed Internet, quicker 
processing speeds, higher screen resolutions, 3D gaming 
technology and augmented reality will transform the Web 
browsing into a 3D experience, where you actually move 
through the virtual corridors of the Web, as a virtual avatar of 
your real self [2]. Recently several Internet-based elementary 
virtual worlds, such as Radar Networks [9], Second Life [11], 
IMVU [12], Active Worlds [13], and Red Light Center [10], 
have gained huge attention by the public worldwide. Users of 
these virtual worlds are growing in a big way everyday. For 
instance, at the end of March 2008, Second Life had more than 
13 million accounts with around 38,000 users logged on at any 
particular moment [14]. These types of environments allow 
users to experience new things which they may never be able to 
have in their real life. Users create avatars on the Web and 
allow them to reside in the virtual worlds. The residents or 
avatars of these virtual worlds can explore, interact with other 
residents, socialize, participate in different activities, create and 
serve different types of services. The possible interactions in 
these virtual worlds occur through text, chat messaging, audio 
x chat, and/or with video. 


The Social Web: 


The Social Web explains the interaction of people with one 
another using the underlying technologies of World Wide Web. 
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Technology advancements in Web 3.0 will take the current 
social computing to a new level called Semantic Social 
Computing or Socio-Semantic Web which will develop and 
utilize knowledge in all forms, e.g., content, models, services, 
& software behaviors [15]. Semantic Web and, in general, 
Artificial Intelligence technologies will add underlying 
knowledge representations to imformation, tags, processes, 
services, software functionalities and behaviors. The wisdom of 
crowds will come not from the consensus decision of the 
group, but from the semantic and logical aggregation of the 
ideas, thoughts, and decisions of each individual in the group. 
Instead of linking documents only, the future Social Web will 
link people, organizations, and concepts automatically. 


The Media Centric Web: 

The most of traditional search engines provide search results on 
the basis of text inputs. Web 3.0 searches will not restrict them 
only to the text based searches. Web 3.0 searches will be able 
to find out the related similar media objects based on its 
features. The search engines would be able to take input(s) as a 
media or a multi-media object and will be able to search out 
related media objects based on its features [2]. For example, to 
search images about cars, we need to provide an input as an 
image of a car and the search engine should be able to retrieve 
images of cars with similar features. The same kind of search 
possibilities should be applied with other media objects such as 
audio and video. The work in this direction is already going on. 
Some good examples of this kind of technology can be found 
on software like Ojos Riya [16] photo sharing tool that allows 
to automatically tag images using face recognition, similarly 
the site Like.com [17] enables the user to search for products 
based on similar images. 


The Pervasive and Ubiquitous Web: 

Remarkable developments in tecbnologies such as wireless 
communications, wireless networking, mobile computing 
devices, artificial intelligence, software agents, Enabling 
technologies (e.g., Bluetooth, BANs, PANs, 802.11 wireless 
LANs), embedded systems, wearable computers have led to the 
evolution of Pervasive & Ubiquitous computing platforms. 
According to Peter Robinson [25], Ubiquitous and pervasive 
computing may be defined as the task of embedding small and 
mobile devices into existing IT and computing infrastructures, 
so that it allows users to access and manipulate information 
where and when it matters, even while on the move. The scope 
and use of web services will not limit us only to computers and 
mobiles but web services will be equally available in clothing, 
appliances, and in automobiles and much more. We need not 
evoke these services every time; they will work and perform 
their task themselves cooperatively and automatically. The 
involvement of user to devices to access and work would be 
almost nothing. For example, using the future web services we 
can find windows and curtains that check the weather and 
automatically open and close accordingly; home appliances 
that know our daily routines and preferences and communicate 
to each other to provide us with a more comfortable living [2]. 
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As stated above, Web services would play an important role in 


this direction and device embeddable form of light weight web 
services will be required. The communication between 
different types of devices and the Web would be possible with 
the help of Service Oriented Architectures (SOA) and related 
technologies for ensuring cross-platform interoperability. Some 
leading software companies are working in this direction. 
Microsoft has released a development API [19] also, it has 
released exceptional innovations featured product called Life 
. Ware [19], which is an excellent example of what this 
' technology can bring in the future [20]. 


' 5. TOOLS AND SERVICES OF WEB 3.0 FOR 

. EDUCATION & RESEARCH 

. The learning in Web 2.0 emphasizes the active participation of 
internet users and interaction among social communities, 
through social network tools or social software such with Blog, 
: wiki, social book marking and social networking. The tools & 
' services of Web 3.0 technologies would foster a more open 
: approach to learning. Web 3.0 has been proposed as a possible 
: future web consisting of the integration of high-powered 
graphics (Scalable Vector Graphics or SVG) and semantic data. 
‚ There have also been discussions around 3-D social networking 
systems and immersive 3-D internet environments that will 
i take the best of virtual worlds (such as Second Life) and 
gaming environments and merge them with the Web. 


About Web 3.0 in leaming, the Tomy Bingham, ASTD, 


President and CEO says: 


“In the Semantic Web, content will find you—rather than (уои) 

' actively seeking it, your activities and interests will determine 
what finds you, and it will be delivered how you want it and to 
your preferred channel. The Semantic Web provides 
tremendous potential for learning.” 


We are in the beginning of a new revolution in information 
management and sharing that will make more and more content 
available to any combination of human and computer 
processing, allowing new means of collaboration between and 
across disciplines. 

Web 3.0 offers many tools and services for different kind of 
web applications on Internet, as shown in figure below. 
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Figure 2: Web 3.0 Tools & Services 


Next, we describe briefly some of the Web 3.0 tools and 
services which are useful for the education and research: 


Learning with 3D-Wikis / Virtual 3D Encyclopedia: 

A Wiki is a system that allows one or more people to build up a 
collection of knowledge in a set of interlinked web pages, using 
a process of creating and editing pages. Wikis are playing 
significant role in content creation, publishing, editing, 
revising, and collaborating for knowledge creation. Wikis are 
being used for maintaining and building a repository of content 
and material. Students are able to work collaboratively and post 


"large items. Ease of use of the wiki software makes it a simple 


matter for an editor (faculty) to delete/revert or modify the 
content With the evolution of 3D web, researchers & 
technocrats have been working on new projects to bring a new 
dimension to the world of Wikis & encyclopedia. Some 
examples of this kind of technology can be found on software 
like Copernicus-3D Wikipedia (see http://copernicus.deri.ie) 
[18]. Suppose a Learner had performed the search and chose 
one of tbe results related to information about a specific 
geographical region, the camera will move to the particular 
place on the spinning globe to send relevant audio/video 
information. For instance, the camera will "fly" towards the 
island of Ireland as a result of searching for irish heritage park; 
eventually, the article about the Irish Heritage Park in 
Williamsburg will be presented to the user alongwith the video 
on irish heritage park [18]. 3D Wikis would be able to provide 
rich & effective environment involving all media and 
animation, for learners, so that they can have better impact on 
learning & knowledge. 


Learning with 3D Virtual worlds & Avatars: 

As mentioned earlier, a 3D virtual world is a mix of 3D gaming 
technology, augmented reality, simulated environment powered 
with Internet technology where users interact through movable 
avatars. Users create avatars on the Web and allow. them to 
reside in the virtual worlds. Learners can create their own 
avatars on the web & reside in these worlds. Virtual worlds can 
be seen as the beginning of new era of e-learning as they allow 
learners to do role-play, 3D modeling, simulations, creativity 
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-. [12], Active Worlds [13], and 


Web 3.0 in Education & Research — 


and their active involvements. There is a huge space for 
conducting research relating to the pedagogical benefits of 
teaching and learning in 3D virtual worlds. Recently several 
web based 3D virtual worlds, such as Second Life [11], IMVU 
Red Light Center [10], have 
gained attention by the students and teachers for education & 
learning worldwide. Educators may conduct classes in a variety 
of different settings within a 3D virtual world where they can 
interact in real like environment of a class. Educators & 
learners may collaboratively conduct sessions from 
geographically dispersed locations in a shared virtual 3D space. 
They can allow educators & learners in conducting meetings, 
seminars, presentations, digital exhibitions where learners can 
come and interact like the same way we do in our real life. 3D 
virtual worlds available today and in coming future will be very 
helpful across a diverse range of disciplines including 
education, medicine, business, commerce, science, 
communication, media, art, architecture and design, law, 
computer science, language learning, history and geography to 
mention but a few. 


Intelligent Search Engines: 

In the last few years, learning processes have benefited from 
the technological evolution of the web. The dispersion of the 
web has permitted the introduction of new educational 
processes, which are more flexible for accessing the resources 


for learning. Now a days Internet has become the most useful, 
4. and powerful source of information. In order to effectively deal 


with the huge amount of information on the web, advanced 
web search engines have. been developed for the task of 
retrieving useful and relevant information! in multimedia form 
for its users [21].When you use a traditional Web search 
engine, the engine isn't able to really understand your search. It 
looks for Web pages that contain the keywords found in your 
search terms. The search engine can't tell if the Web page is 
actually relevant for your search. It can only tell that the 
keyword appears on the Web page. A Web 3.0 era of Agents 
based-search engine could find not only the keywords in your 
search, but also interpret the context of your request. It would 
return relevant results and suggest other content related to your 
search terms. Experts believe that Web 3.0 will provide users 
with richer and more relevant experiences. Experts also believe 
that with Web 3.0, every user will have a unique internet 
profile based on that user's browsing history. Web 3.0 will use 
this profile to tailor the browsing experience to each individual 
That means that if two different learners, each performed an 
internet search with the same keywords using the same service, 
they would receive different results determined by their 
individual profiles [22]. Students will also benefit from 
knowledge construction powered by the Semantic Web. A 
, Sene Web Agent. based care engine -wil retum а 
^. multimedia report rather than just a list of hits. A smart agent 
can return local lectures, relevant blogs, books and television 
programs about the topic to the learner. Ontologies will link the 
learner's needs and characteristics so that personalized agents 
can search for learning material based on the learners’ needs 
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[23]. Learners can apply the same kind of search possibilities 
with other media objects such as image, audio, and video. 
Some examples of this kind af technology can be found on 
software like Ojos Riya photo sharing tool that allows to 
Butomatically tag images using face recognition [16], or 
Like.com which enables the user to search for products based 
on similar images [16]. ` 


Online 3-D Virtual Labs / Educational labs / Simulations or 
3D Web: | 
3D rich graphical user interfaces will act as a powerful 
platform for the users to participate and perform collaborative 
activities, sharing results and exchanging media information 
among participants in a more natural way [26]. The following 
are some of the examples of 3-D Virtual Labs/Educational 
taba Simi arona ben Бае аро гана Patri Mage 
future education: | 


* To visit places those are not accessible: Visiting differed 
places in virtual worlds would benefit learners in many ways. 
Ancient places where students can reach there in a small span 
of time virtually. For example, to take a look at ancient places 
like Tajmabel, Red fort or Rome, Students can interact & 
experience with the environment of the places, other students 
and can have their teacher as guide through the web. Similarly, 
they can see the Egyptian pyramids or visit an Egyptian village 
in the same way. There is so much scope where we can teach 
the students and give them a safe and economic way of 
experiencing such things. | 

| 
* To promote student collaboration: Students can come 
together & meet virtually in diverse and attractive manner! 
They can collaborate & work on common projects. Studeats & 
Educators may have discussions, talk, connect, and chat on the 
common projects. Additionally, they can flyover and move 
ay ا قافا و ا‎ 
multiple 3D worlds instantaneously. 


c ——— A For 
instance, students can do research and create a (virtual) village 
in, say, the Roman Empire. Additionally, a whole group of 
students around the world could creste this environment while 
н он neg 
together оп a project & able to experience the interesting ways 

of learning at a distance. 


* To develop scenarios and simulations: High end graphics and 
rich 3D inteznet applications can be utilized to make simulation 
based environments or Labs where learners can leam or 

do experiments. These Labs are so-called dry labs. These Web 
besed Labs can prove to be quite beneficial for online learners. 
They could go to ап immersive virtual science lab to do 
experiments. After the simulation, students could ро offline 
into a real science lab to perform the correct experiment and 
sec how it works [27]. High level scientific experiments could 
be conducted, and expert technical training could be obtained, 
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'in ways that a university or school could not afford. For 
example, imagine splitting atoms, conducting surgery, flying a 
‚ plane or exploring inhospitable environments. 


6. CONCLUSION 
| Web 3.0 is more than a set of useful and new technologies and 
. services. Web 3.0 technologies offer an array of services to 

‘make а true online classroom a reality. Because of its very 

' nature Web 3.0 services will be having positive impect on 

' teaching and learning. Web 3.0 technologies offer benefits of 

. 3D-wikis, 3D Labs; Intelligent Agent based search engines, 
Virtual environments like Avatar and Semantic Digital 

‚ Libraries etc. In our vision of the Web 3.0, we foresee а 
scenario where such ubiquitous technologies will create a 

, convergence of real and virtual environments, where the user 
will seamlessly interact with humans and machines either 
‚ through virtual means or in the real world. These benefits can 
' be directly aligned to the existing best practices in online 
' education, and make further authenticated and effective 

| ‚ educational environment. 
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ABSTRACT 

Dite divide Sees to 2 nbsaudal affine RA д 
or more populations in the distribution and effective use of 
information and communication resources. Despite the boom 
in the availability of access to communication resources since 
the beginning of the 1990s, the divide is deepening and the 
differences in the usage of communication resources between 
countries and regions intensifying. Even though the rural 
areas have benefitted to some extent from the boom in access 
to communication resources, the regional divide is more 
pronounced within the developing countries. Factors 
: influencing digital divide vary from region to region. In an 
attempt to find the factors responsible for the digital divide in 
Jammu and Kashmir region, a pilot survey was conducted. 
This paper reports on the results of this pilot study. The study 
was conducted by floating questionnaires and by interviewing 
people of rural as well as urban areas. Questions were related 
to internet access, its usage, problems faced in its use etc. On 
analysis of data, many other observations other than the 
digital divide factors have been reported. It was found during 
the study that the government is providing facilities for 
internet access but awareness of these initiatives is still 
lacking. People residing in rural areas are hesitant to use 
internet .due to lack of English language proficiency. This 
paper is a result of the pilot survey to examine the factors 
responsible for the regional digital divide and will help in 
suggesting methods to bridge this divide. 


KEYWORDS: ICT, Digital Divide, Internet, Community 
Information Centers, Common Service Centers. 


1. INTRODUCTION 

The world we live in has been changing rapidly with the 
emergence of the ubiquitous society bringing forward 
extraordinary benefits and opportunities together with new 
challenges. The ability to create and utilize information plays 
a significant role in the economic and social structure of our 
lives. Greater awareness of the importance of information in 
defining our future has compelled nations across the world to 
commit themselves to the progressive development of ICT 
industries. On the other hand, ICT development has also 
deepened the problem of serious digital divide between 
developed and developing countries. The digital revolution 
has facilitated a fast transition from the industrial economy to 
the IT network-based information economy, causing the 





resulting digital divide to deepen economic disparities or 
polarization in wealth !! The digital divide affects many 
nations of the developing world. The term encompasses 
inadequate funding, a lack of necessary computer and Internet 
skills, and a lack of English-language proficiency that hinder 
expansion and use of digital information resources 7! 

The rest of the paper is structured as follows: A brief 
introduction of Digital Divide, formulation of hypothesis, 
methodology of data collection, a brief introduction to the 
questionnaire, data analysis and finally the conclusion of the 
paper is presented. 


2. DIGITAL DIVIDE 

Information and Communication Technologies (ICTs) can be 
both a unifying and a divisive force. Its divisive aspect is 
known as the "digital divide", which relates to the difference 
between those who have digital access to knowledge and those 
who either lack it or don't use it effectively. The digital divide 
can be defined as the gap between individuals, households, 
businesses and geographic areas at different socio-economic 
levels with regards both to their opportunities to access ICTs 
and to their use of the Internet for a wide variety of activities. 
As the Internet has rapidly grown to underlie almost every 
aspect of the global economy, the term "digital divide" has 
often been referred to Internet access Ü It is a divide that 
affects and reinforces fundamental economic and social 
divides between and within countries and is threatening to 
further exacerbate these inequalities. Those who are 
“connected” are in for a greater advantage in terms of 
competing on a global basis, increased share in the market, 
increased knowledge, increased productivity and higher 


low GDP, increased unemployment 
deepening marginalization. Developing countries and non- 
privileged groups bave difficulty in "connecting" and 
difficulty in using Information Technology (IT) effectively 
because of anyone or more of the following: illiteracy, 
poverty, low level of skills, high cost of access, and even, poor 
mastering of tbe English language [4] . 


2.1 DIGITAL DIVIDE NOTIONS 

The digital divide is a problem of multiple dimensions. 
Kling! (1998) sees the divide from (1) a technical aspect 
referring to availability of the infrastructure, the hardware and 
the software of ICTs, and (2) the social aspect referring to the 
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skills required to manipulate technical resources. Norris"! 
(2001) describes (1) a global divide revealing different 
(2) a social divide referring to inequalities within a given 
population; and (3) a democratic divide allowing for different 
levels of civic participation by means of ITCs. And 
Keniston"(2003) distinguishes four social divisions: (1) those 
who are rich and powerful and those who are not; (2) those 
who speak English and those who do not; (3) those who live in 
technically well- established regions and those who do not; 
and (4) those who are technically savvy and those who are not. 


22 DIGITAL DIVIDE: INDIAN PREVIEW 

India, a union of states, is the second most populous nation in 
the Asian region behind China. The country has achieved 
impressive progress in the field of science and technology and 
is emerging as one of the strongest economies in the 
developing world. Information and communication 
technologies have brought significant changes in development 
of the Indian society through information dissemination. In 
dndia, the benefits of IT are beginning to be seen and the 
impact of these benefits is creating a great change. It is also 
true that the use of digital technologies in the world has not 
only improved people's day-to-day life but it has also divided 
the world into information rich and information poor, Le. the 
information haves and have-nots. The unequal access to 
information and commmmication technologies has led to a 
massive divide digitally. Although India has been one of the 
emerging super powers in IT, the benefits have been literature 
remarkably slow, particularty in rural and remote areas. 
Besides socio-economic factors, geographic, educational and 
attitudinal factors have been some of the challenges for the 
government when introducing IT-oriented programs. 
Although underserved communities in India are gaining 
access to computers and the Internet, their benefits are limited 
because of the factors namely, Political Instability, 
Infrastructural barriers, Literacy and skill barriers, Economic 
barriers, Content barriers, Linguistic Diversity |? One 
formidable obstacle to ICT diffusion is language. There is a 
self-perpetuating cultural associated with ICTs 
(Keniston, 2002). By the year 2000, only 20% of all Web sites 
in the world were in languages other than English, and most of 
these were in Japanese, German, French, Spanish, Portuguese, 
' and Chinese. But in the larger regions of Africa, India, and 
south Asia, less than ten percent of people are English-literate 
while the rest, more than two billion, speak languages that are 
sparsely represented on the Web. Because of the language 
barrier the majority of people in these regions have little use 
for computers. Those who do not use computers have little 
шеша te dye market demanda foe competes applica anc 
their language ™ 


3. HYPOTHESIS 
On the basis of above review of literature following 
hypothesis was framed for the study: 
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H: Following are the factors responsible for the regional 
Digital Divide: 


i) Internet Access 
Unawarences of the ICT programmes and the ~ 
advances in technology 


Linguistic Diversity 
Internet Cost 


4. METHODOLOGY FOR DATA COLLECTION ' 
The objective of the pilot study was to elicit through 
questionnaires and interviews the major barriers to the use of 
internet. Convenient sampling was used to collect data for 
the pilot study, some people were chosen from rural areas 
and some from the urban areas of Jammu. Interviews and 
questionnaires and were used as tools to extract the required 
data. The survey included questions on telephone servict, 
bousehold income, race, age, educational attainment, 
geographic region, language preferred to read and write 
computer ownership; access to technical resources, interest in 
obtaining access, and attitudes toward technology. Location 
of internet access and reasons for using the Internet 
information needs and the way in which people use 
information were also studied. 


5. GENERATION OF SCALE ITEMS 
The questionnaire was designed after intensive literature 
ions were based on the problems studied in the 2 
7 guch as availability of resources (telephone, 
computer, internet etc) at home and office , internet access 
after school/office, awareness of e-services, availability of 
internct access points such as Community Information Centers 
(CICs) and Cyber Cafés, knowledge of e-services, problems 
faced in using the internet etc. The questionnaire consisted of 
35questions out of which 11 were of demographic profile and 
in the rest of questions the respondents were requested to 
select the response that best indicated their answer on each 
statement, using a five point Likert scale where 1—Strongly 
Agree, 2=Agree,3=Indifferent, — 4-DisAgrec,5-Strongly 
Disagree. Sample Size for the Pilot study was conveniently 
taken as fifty. The questionnaire is shown in the appendix. 


6. DATA ANALYSIS 

The pilot survey was conducted to study the factors 
responsible for the digital divide in Jammu and Kashmir. 
Other observations made during data analysis include division 
in the usage of ICTs along the line of Gender, education and 
age. The details about the various digital divide factors studied 
in the literature were found to be: 


6.1 INTERNET ACCESS 2 
Workplace (асе, school; college eic): wan found. bo; be dis 
most common place for internet access. Most offices and 
schools provide internet (broadband) access; therefore people 
indulge in internet activities at work, only 229b of the 
eee Dorchester ше i 
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Figurel: Availability of Internet Connection at Workplace 


It was calculated that the respondents who did not have 
internet provision at work either used internet at home or 
visited a cyber café. Out of these respondents, 55% had 
internet connection at home, 27% visited cyber café, and 18% 
respondents had never used internet. Respondents who had 
never used the internet quoted that there was no need for them 
to use it і.е. they were not aware of the activities they could 
be engaged with on the internet. 

The mean value for the Access Factor as a digital dividend has 
arrived at 1.59, indicating it to be another factor responsible 
for the digital divide (t 1.076, p > .05). 


ю Ноте 
CyberCafe &CIC 


a do cot use 





Figure2: Internet usage Options for those without internet 
access at work place 


6.2 AWARENESS OF GOVT.INITIATIVES 

The role of e-Government generally refers to the delivery and 
administration of Government products and services over an 
IT infrastructure, such as the provision of information 
electronically using Internet portals, online tax assessment and 


. jme.nicin, jandkbenlcoin etc. Banks also have their own 


websites to deliver information and services. Government has 
also opened up Community Information Centers (CICs) and 
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common Service Centers (CSCs) to provide internet access to 
the people. 


е Community Information Centers (CICs) 

The Government of J&K has opened up 135 Community 
Information Centers (CICs) in various locations for internet 
access at nominal rates. The CICs provide some basic services 
that include internet browsing, e-mail, printing, data entry, 
word processing and training for the local populace on the 
fundamentals of computers. Some or all of these services are 
provided by all CICs. In addition, a large number of CICs 
offer several services with a G2C orientation. Services offered 
by CICs may be classified into five main categories, namely: 


° Common Service Centers Khidmat Centers (CSCs) 
Common Service Centers/ Khidmat Centers are centers 

opened by Jammu & Kashmir bank to avail all basic banking 
services offered it. They help the bank to deliver core 
banking services to the people at their door-step while 
bringing more and more public spaces within the fold of 
formal financial channels. Besides, these centers create 
employment at grass-root level and throw opportunities for 
youth, from rural areas 0) Calculations show 
that only 32% of the respondents were aware of these 
facilities and only 10% had visited them. It is now clear that 
government is taking initiatives to provide internet access to ' 
people, indicating that awareness of the government 
initiatives is a major obstacle towards bridging the digital 

divide. Government must frame policies to make people 
aware of such initiatives, so that these cfforts show good 
results. The fundamental problem of extending access to all 

in a society and all geographic areas still remains. There is a 
need to open more of such centers to increase the access rate. 

The mean value for the awareness parameter has arrived at 
2.86 on the five point scale, which reflects that awareness is 
causing digital divide. The hypothesis also stands accepted as 
there is no difference in expected and observed value 

(t 1.206, p » .05). 


6.3 COST 

Significant changes have taken place ш the 
telecommnnications policy and market in India in the last few 
years. Favorable government policies and lower costs have 
created a platform for rapid growth. This boom in the 
telecommunication industry has lead to a drop in the 
communication costs. Telephone and internet today are 
affordable, yet the mean value for cost as a digital dividend 
has arrived at 2.70. The hypothesis again stands accepted as 
there is no difference in expected and observed value 

(t 559, p > .05). 
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6.4 LANGUAGE 
Language is the primary vector for communication. Less than 
5 percent of people can either read or write English (Census 
2001). Only a small, rich, successful and English speaking 
minority in India is ‘connected’. Lack of English language 
proficiency has created a 'computer fear'. On discussing 
issues relating to use of e-services, most respondents 
mentioned language issues. In spite of the availability of all 
the information online, people visit government office to seek 
information. The lack of software and instructions in minority 
languages also presents a huge barrier to ICT adoption. 
The mean value for language as a dividend influencing the 
digital divide has arrived at 2.22. The hypothesis also stands 
88 there is no difference in expected and observed 
value (t 1.532, p > .05). 


6.4 Descriptive Statistics 


Table 1: Mean Score of Access, Awareness, Cost and 
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Figure3: shows the factors responsible for the digital divide 


1.01376 





7. OTHER OBSERVATIONS: 

Some other observations that were made during the pilot 

. Survey are discussed below: 

• Qualification has a major contribution in the usage of 
internet. It was observed that most users of e-services 
(such as c-billing, e-shopping, e-ticketing etc) were 
professionals or technically educated. They believe online 
activities save time and are also hassle free. Other 
respondents mostly indulged in entertainment activities 
such as chatting, downloading music, surfing etc. They 
were hesitant to use services that required monetary 
transactions due to lack of trust. 

* Gender has no significant contribution to the digital 

divide. There is no gendered difference in the usage of 

computers. On data analysis, it was noticed that the 50 

families questioned, consisted of a total of 227 people, of 

these 114 ie. 50.2% had knowledge of computers, 50 out 
of 114 were females accounting to 48.996 of the computer 
literates. 
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Figure3 : Female computer users 


e Activities done online differ by age and also by profession. 
People of different age groups indulge in different 
activities online. Most Common online activities include 
Emailing, chatting, music/movie download, games and 
social network, matrimony and search engines. E- Services 
that included monetary transactions are used mainly by 
professionals. Activities also differ from region to region. 





The following table shows the ranking of internet activities in 
ural and urban areas. 


Fig 4: shows the ranking of Internet Services 


8. DISCUSSION AND CONCLUSION: 

The digital divide is a multifaceted problem. This paper 
reports on the factors responsible for the digital divide, 
according to a pilot survey conducted in J&K .Much of tbe 
digital divide effort is focused on extending 
telecommunication infrastructure and supplying terminals to 
users. However, illiteracy and a lack of communication and IT 
skills are major components of the digital divide and must be 
considered and addressed alongside efforts to expand the 
physical network. The factors found in the study are found to 
be similar to those in the literature review. Many initiatives 
have been taken to provide internet access; costs have also - 
been cut-down to make ICTs affordable. Attempts have been 
made to make the web language free, yet the digital divide 
remains. We need to develop models of collaboration among 
researchers, social scientists, technologists, etc. so that local 
requirements are met in a technology innovation. 
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9. LIMITATIONS AND FUTURE SCOPE 

It is necessary to recognize the limitations of the current study. 
One limitation is the small sample size. To examine the digital 
divide factors accurately, a larger sample is desirable. Another 
limitation is the convenient sampling method used. Future 
research needs to focus on larger cross section of internet 
users by employing more diversified samples. 
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12. APPENDIX A-LIST OF FIGURES 

Fig] shows the percentage wise availability of Internet 
Connection at Workplace and confirms that internet is mostly 
used at workplace. 

Fig2 shows Internet usage Options (home, CIC, cybercafé) 
for those without internet access at work place. 

Fig3 shows the female computer users accounting to 49% of | 
the computer literates in the sample. 


APPENDIX B-LIST OF TABLES 
Table 1 shows the descriptive statistics i.c. the mean score of 
the digital divide factors (Access, Awareness, Cost and 


Language). 
Table 2 shows the area wise (Rural/Urban) ranking of Internet 
Services. 


APPENDIX C-QUESTIONNAIRE USED IN THE PILOT 
SURVEY 


The Questionnaire used for the pilot survey is given below: _ 
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by ublic sector banks(PSBs) alone between September 1999 
and March 31, 2009 is Rs. 17897 crore. Today, information · 
technology seems to be the prime mover of all banking 
transactions. Trends show that banks in India have been 
endeavoring to leverage technology to bring about 
improvements in; quality of customer services, scale and 
specialization in products, alternative sources of income 
particularly from fee-based services, geographical reach 


ABSTRACT 

Banks in India have invested heavily on deployment of 
information technology (IT) in the past one decade. IT over the 
years has become business driver rather than a business 
enabler. Sustainable development of banks depends heavily on 
effective use of IT. This calls for measuring the effectiveness of 
IT in these banks. This paper identifies the economic methods of 
measuring IT effectiveness on the basis of review of literature 


on the subject. through communication networks and electronic delivery 

channels, risk management practices, housekeeping, internal 
- KEYWORDS control systems and regulatory compliance and cost 
Information Technology (it), effectiveness, sustainable efficiencies and scale economies. In other words, banks in 
_ development, economic methods India started perceiving IT as a tool to achieve improvement 


in the efficiency (more output with less input) and 
effectiveness (outcomes). An indication of the extent of 
investment and percolation of IT in different categories of 
banks is evident from the data presented in Tablel. 


' 1. INTRODUCTION 
In the past decade banks in India have invested heavily in the 
. information e E Total expenditure incurred on 

B of communication networks 


development 





er a 


а —— 7 = 


Table 1.1: IT Percolation in Banks in India (as on March 2009) * Estimated amount 
Source: RBI’s Report on Trend and Progress of Banking in India, 2008-2009 
It is clear from the data, shown in Table 1 that banks have effectiveness using both economic as well as non economic 


invested heavily over the years in information technology 
systems. Looking the dependence of banks on IT, there is no 
doubt that, IT over the years has become business driver rather 
than a business enabler. This is clear that banks sustainable 
: development depends heavily on effective usage of IT. 
Therefore measuring the TT effectiveness is the major concern 
of management today. In our paper we have identified the 
methods of measuring IT effectiveness by reviewing the earlier 
, studies on the subjects. Earlier studies have measured the IT 


measures. Our paper reviews only those studies which have 
used the economic methods for measuring the IT 
effectiveness. 

2. REVIEW OF LITERATURE 

Although banking is among the most IT-intensive industries 
and among those that started early to rely massively on 
computers for their operations, the large bulk of applied 
literature on bank technology includes very few studies on 
this topic, mainly because of the paucity of appropriate 
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main findings, is presented in Table 2. 


СМЕО СЕЕ 
significant іп 1989 and іп 1994, Ње effects of IT on 
cost efficiency were small 


IT contributed to reduction in demand deposits and 
increass in time deposits. IT also helped to increase 
in other loans and decrease in installment loans. IT 
was also responsible for saving labor. 


There was a 17 to 23 percent increase in 
productivity with the use of computers. The returns 
were very modest compared to the levels of IT 
investments. 


Inefficiency in IT-related value added activities 
always lead to overall inefficiency. Around 64 
percent of units that had efficient IT- related activity 
also had perfect overall efficiency. 


Additional investment in IT capital had no real 
benefits and may be more of strategic necessity to 
stay within the competition. However the results 
indicated that there were substantially high returns 
when investment in IT labor was increased. 


Micro-environment in the branches had an effect on 
their efficiency and urban branches had better 
efficiency than rural branches. 

Higher performance levels had been achieved 
without corresponding increase in the number of 
employees. Also operating expenses of the banking 
system had declined during the study period, 
indicating the positive impact of computerization. 


E-business capital and e-business as well as non e- 
business labor made positive contributions to 
Su od eee ee ee 


Аруша the developed model on the Gaia of 22 
banks for the period 1987 to 1989, they concluded 
that IT budget was not efficiently utilized in the 
study period. 


Low operational efficiencies existed in the banking 
industry during the study period, 1996 to 2000. 
These inefficiencies were in nature ascribable to a 
combination of both wasteful over use of 
information technology resources and inappropriate 
scale of information technology investments. 


Japanese banks Cobb-Douglas IT capital has either positive or no effect on 
о productivity 
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Technique(s) Used 





Table 2: Analysis of the Earlier Studies on Impact of IT on Banks 


3. METHODS OF MEASURING IT EFFECTIVENESS 
‘Literature survey reveals that along with the performance 
ratios, econometric and linear programming approaches are 
available to measure the IT effectiveness. Performance ratios 
are widely used in all sectors of business. The best known 
ratios aré for financial and production managers. The financial 
ratios regarding liquidity, capital adequacy, earnings and 
liability are widely used measures of organizational 
performance. While in banking sector intermediation cost, 
interest spread, operating expenditure, cost to income ratio, 
return on assets, retum on equity, business per employee, 
income per employee and business per branch, among others, 
are same of the commonly used ratios for assessing the 
efficiency and productivity of a banking unit. However, they 
have disadvantages like (i) each single ratio must be compared 
with some benchmark ratio one at a time (ii) while the 
calculation of a set of financial ratios is relatively easy, the 
aggregation of those ratios can be quite complicated, which 
Tequires experienced judgment (iii) financial ratios do provide 
information on the overall financial performance of an 
organization, but provide little information about the amount 
by which performance could be improved or the area where the 
‘effort should be focused in order to improve performance (iv) 
‘ratio analysis also fails to consider the multiple input-output 
characteristics of business enterprises and cannot give an 
overall clear picture of organizational operations because firm 
performance may exhibit considerable variation, depending on 
the indicators selected. Looking at the disadvantages of ratios 
as a performance measurement technique, in the recent 
banking literature the attention has mostly been directed to the 
latter two techniques of frontier efficiency analysis, namely, 
‘econometric approach and linear programming approach, 
which can provide comprehensive insights beyond those 
available from financial ratio analysis for evaluating and 
improving IT effectiveness. 

After ' seminal study by [Farrell, 1957], methodological 
` development in frontier efficiency analysis has been growing 
at a rapid pace. Presently, there are multitudes of techniques, 
arametric and nonparametric, stochastic and deterministic are 
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Investment in IT services from external providers 
(consulting services, implementation services, 
training and education, support services) had a 
positive influence on accounting profits and profit 
efficiency, while the acquisition of hardware and 


software reduced banks' performance. 

Private sector banks had a slight edge over their 
industry counterparts during the study period of 
2001 to 2006. Further, on the technology front as 
well as in exercising managerial control, substantial 
scope existed for improvement, across the sector. 


available for performance measurement. The essential 
differences among these techniques based on the differing 
assumptions used in estimating the shape of the frontier and 
the distributional assumptions imposed on the random error 
and inefficiency. 

There are at least five different types of approaches in the 
literature that have been employed in measuring IT 
effectiveness. Of those, three are econometric approaches i.e. 
stochastic frontier approach (SFA), distribution-free approach 
(DFA) and thick frontier approach (TFA), which are 
parametric, and two linear programming approaches which 
are nonparametric i.e. data envelopment analysis (DEA) and 
free disposal hull (FDH). Each of the approaches has 
weaknesses, as well a8 strengths relative to the other. The 
literature has not yet come to a consensus about the preferred 
approach for determining the best-practice frontier against 
which relative efficiencies are measured. In general, 
parametric approaches are stochastic, which distinguish the 
effects of inefficiency from tbe effects of noise. A key 
drawback of parametric approaches is that they usually 
specify a particular functional forth that presupposes the 
shape of the frontier. If the functional form is misspecified, 
measured effectiveness may be confounded with the 
specification errors. In ‚ sharp contrast to parametric 


frontier. They are deterministic and do not allow for random 
error owing to luck, data problems or other measurement 
errors. If random errors do exist, measured effectiveness may 
be confounded with these random deviations from the true 
efficiency frontier. Most of studies on banking have used 
either SFA or DEA approach to calculate the effectiveness. 
Both the DEA and SFA approaches have their individual 
strengths and weaknesses. The SFA approach has the 
advantage of allowing for random shocks and measurement 
errors. Another advantage of the SFA approach is that it is 
possible to analyze the structure, and investigate the 
determinants of, producer performance. Therefore, it has a 
more solid grounding in economic theory. On the other hand, 
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weaknesses with the whole family of econometric approaches 
to efficiency measurement (to which SFA belongs) are (i) It is 
risky to impose a priori assumptions on the production 
, technology by choosing a functional form (e.g. Cobb-Douglas, 
translog, etc.) given that most of the distributional 
characteristics of tbe production technology are a priori 
unknown (ii) The precise specification of the error structure is 
difficult (sometimes even impossible) to ascertain. In addition, 
such specification is likely to introduce another potential 
source of error (iii) The continuity presumed in this approach 
may lead to approximation errors. 

Compared with the stochastic parametric frontier approach, 
DEA has advantages in measuring the relative efficiency of 
banks. First, DEA is non-parametric frontier approach and 
does not require, rigid assumptions regarding production 
technology and specific statistical distribution of the error 
terms. Second, DEA is amenable for small sample studies. 
Third, as a non-parametric frontier technique, DEA identifies 
the inefficiency in a particular bank by comparing it to similar 
banks regarding as efficient. Other DEA advantages are 
[Banker and Morey, 1986, Sengupta, 1988] identification of 
bad from good performers by generating an overall, easy to 
interpret efficiency score; independent measurement units 
(giving great flexibility in selecting outputs/inputs); and 
manipulation of uncontrollable, environmental factors, e.g. 
competition. However, the DEA model does not allow for 
measurement error or random shocks. Instead, all these factors 
are attributed to (in) efficiency, a characteristic that inevitably 
leads to potential estimation errors. 


4. CONCLUSION 

In this paper author on the basis of review of literature has 
identified methods of measuring IT effectiveness in banks of 
India. There are at least five different types of approaches in 
the literature that have been employed in measuring IT 
effectiveness. Of those, three are economeinc approaches i.e. 
stochastic frontier approach (SFA), distribution-free approach 
(DFA) and thick frontier approach (TFA), which are 
parametric, and two linear programming approaches which are 
nonparametric i.c. data envelopment analysis (DEA) and free 
disposal hull (FDH). Most of studies on banking have used 
either SFA or DEA approach to calculate the effectiveness. 
.Advantages and disadvantages of cach method are also 
. discussed in the paper. 
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ABSTRACT 
In database systems, user makes query and that query will be 
responded by the DBMS. Generally, there are a variety of 
methods for computing the response of the given query. It is 
the responsibility of the query processor to transform the query 
hs entered by the user into an equivalent query that can be 
more efficiently. Query optimization is the process to 
find a good strategy or best query evaluation plan for 
processing a query. In the today's competitive environment, 
query optimization is one of the important criteria based on 
which one can compare the available commercial RDBMSs. 
The objective of this study is to discuss about techniques used 
` by the Oracle for optimizing the queries and to present a 
comparative study of the various costs involve to execute the 
LIKE Operator based queries. This comparative study is based 
on empirical study done on Oracle 8i, Oracle 9i and Oracle 
10g. 
KEYWORDS 
Query Optimization, Oracle, Cost Control, LIKE Operator, 
Pattern Matching in DBMS 


INTRODUCTION . 
Query optimization is ane of the important issues in database 
systems. A query may be expensive in terms of cost of 
execution if it is not optimized. In centralized database 
management systems, an efficient query processor would try to 
minimize tbe utilization of computing resources, such as 
storage space and processor time. In distributed environment, 
apart from the storage space and processor time; the costs of 
communication delays, setups and transmission have to be 
minimized. Total cost and response time are the good 
measures to compare the cost of queries in terms of resource 
consumption. [DBO1] 

Oracle is one of the most popular and efficient commercial 
RDBMS. Oracle claims that it uses both rule-based query 
optimizer and cost-based optimizer. The goal of the cost-based 
optimizer is the best throughput (Le. the least amount of 
resources necessary to process all rows accessed by the SQL 
statement). Also, Oracle claims to optimize a statement with 
the goal of best response time (Le. the least amount of 
resources necessary to process the first row accessed by a SQL 
statement). In general, it uses the cost-based approach. Oracle 
Corporation is continually improving its cost-based optimizer. 


The rule-based approach is available for backward 
compatibility wit legacy applications. [W001] 
The objective of this study is to present the comparative 

of LIKE Operator of query optimizers used in 
Oracle 8i, 91 and 10g by using large volume of hypothetical 
data. | 


1. METHODOLOGY 

This study is based on hypothetical data which is generated 
using an algorithm to generate random strings of variable 
length. The table is populated with 11 columns and 10° 
records. The table contains the strings based on all alphabets 
and blank space of maximum of 100 characters in each 
column. 

For the analysis, the queries based on LIKE predicated are 
executed on the table (both without index and with index) on 
different versions of Oracle. This study broadly covers the 
three versions of Oracle, i.e., Oracle 8i, Oracle 9i, and Oracle 
10g. The query execution plan and response time are observed 
and analyzed with help of different tools of Oracle. 


2. THEORETICAL ASPECT OF QUERY 
OPTIMIZATION { 
Query optimization is the process to derive a number of query- 
evaluation plans to execute the query and selects the most 
efficient plan. It is the responsibility of the query optimizer to 
come up with a least-cost query-evaluation plan that computes 
the same result as the given relational-algebra expression (or, 
at least, is not much costlier than the least-costly way). There 
are several optimization criteria that have to be taken into the 
consideration at the time of optimization. [GH01] [СНО2] 
To find the least-costly query-evaluation plan, the optimizer 
needs to generate alternative plans that produce the same result 
as the given expression, and to choose the least-costly one. 
Generation of query-evaluation plans for and expression 
involves three steps: 

1. Generating logically equivalent expressions using 

equivalence rules 
2. Annotating resultant expressions to get alternative 
query plans 

3. Choosing the cheapest plan based on estimated cost 
= Rule-Based Optimizer: Rule-based optimizer generates 

the equivalent optimal query evaluation plan by using the 

equivalence rules for the given relational algebraic query. 

It generates expressions equivalent to a given expression 
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by means of equivalence rules that specify how to 
transform an expression into a logically equivalent one. 
The optimization based on equivalence rules is very 
expensive in space and time. [SK01] 

=  Cost-Based Optimization: А  cost-based optimizer 
generates a range of query-evaluation plans from the given 
query by means equivalence rules, and chooses one with 
the least cost. In general, with n relations, there are (2(n- 
1))! / (n-1)! different join orders. Brute-force method 
introduce large overhead in the optimization process 
because it will evaluate the cost of each evaluation plan 
separately, compare their costs and selects the least cost 
query-evaluation plan. This overhead can be reduced by 
applying the theory of dynamic programming, which can 
also be used for finding optimal query-evaluation plan 
optimistically. [SK01] The cost of an operation depends 
on the size and order statistics of its inputs. To estimate 
the cost of an operation some statistics about database 
relations are required, which are stored in database-system 
catalogs. The statistics has to be updated every time a 
relation is modified so that accurate statistics can be 
maintained. The updation of statistics may incur a 
substantial amount of overhead. [GH01] [СНО2] 

.^ MHeuristics-Based Optimization: The cost of optimization 
is the major drawback of cost-based optimization even 
with dynamic programming. The number of choices can 
be reduced by using heuristics that must be made in a 
cost-based fashion which will reduce the cost of 
optimization. Some systems use only heuristics; others 
combine heuristics with partial cost-based optimization. 
[GH01] [GHO2] 

'" Materialized Views: Materialized view is one of the 
concepts which can also be used for query optimization. A 
materialized view is a view whose contents is computed 
and stored, which can be used to speed up query 
processing e.g. indices. If base relations are modified then 
incremental maintenance is needed to efficiently update 
these views. In query optimization, materialized views are 
treated just like regular relations. [СНО1] [GHO2] Most 
database systems provide tools to help the database 
administrator with index and materialized view selection. 
These tools examine the history of queries and updates, 
and suggest indices and views to be materialized. The 
Microsoft SQL Server Database Tuning Assistant, the 
IBM DB2 Design Advisor, and the Oracle SQL Tuning 
Wizard are examples of such tools. 


3. QUERY OPTIMIZATION IN ORACLE . 

A large variety of processing techniques are supported by 
Oracle in its SQL processing engine. SQL processing engine 
has four main components: parser, optimizer, row source 
generator, and SQL execution engine ire Rus 1). 


Copy Right © ВІЛТ - 2011 Vol. 3 No. 2 ISSN 0973 - 5658 





Figure 1: SQL Processing Architecture (adapted from 
[WO01]) 

The syntax and semantics analysis of the SQL 
statements 1s done by parser. The optimizer uses internal rules 
and/or costing methods to determine the most efficient way of 
producing the result of the query. The optimizer returns an 
optimal query plan for execution. The Oracle provides two 
types of optimizers: cost-based optimizer (CBO) and rule- 
based optimizer (RBO). The execution plan for the SQL 
statement is generated by the row source generator with the 
help of the query plan received from the optimizer. The 
execution plan is a collection of row sources structured in the 
form of a tree. Each row source returns a set of rows for that 
step. Each row source produced by the row source generator is 
executed by the SQL execution engine to produce the results 
of the query. 

By default, optimizer mode is cost-based optimizer with the 
goal of best throughput. Also, Oracle can optimize a statement 
with the goal of best response time. In general, Oracle uses the 
cost-based approach but it also supports rule-based approach. 
Oracle Corporation is continually improving the cost-based 
optimizer and adding the new features which will work only 
with the cost-based optimizer. The rule-based approach is 
available for backward compatibility with legacy applications 
[WD01]. 

The cost-based architecture is shown in figure 2. By using the 
parsed query, the query transformer to determines if it is 
advantageous to change the form of the query so that it enables 
generation of a better query plan. Four different query 
transformation techniques are employed by the query 
transformer: view merging, predicate pushing, sub-query 
unnesting, and query rewrite using materialized views. [GH01] 
[GH02] Any combination of these transformations might be 
applied on the received parsed query. 
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The estimator generates three different types of measures: 
selectivity, cardinality, and cost. The selectivity, which 
represents a fraction of rows from a row set, lies in the value 
range 0.0 to 1.0. There are several types of cardinality 
„measures: effective, join, distinct, and group cardinality. The 
cost represents units of work or resource used in performing an 
operation. The cost-based optimizer uses disk I/O, CPU usage, 
and memory usage as units of work The plan generator 
computes the cost of different possible plans for a given query 
and selects the one that has the lowest cost. 





: Figure 2: Architecture of Cost-Based Optimizer (adapted from 
[WO01) 


In Oracle, large set of processing techniques is used in one 
engine. Many different join and index methods, parallel query, 
etc. e.g., nested loops, sort-merge, and bash joins, anti-joins, 
semi-joins, B.tree, bitmap, reverse, functional, and domain 
indexes, clusters and hash clusters, index-organized tables, 
nested tables, materialized views, partitioning, join indexes, 
. index joins, index skip-scan, etc. are used. No single technique 
is best for everything. Oracle provides many different ones, 
and the optimizer determines which the best for cach 
individual query is. [WD01] 


‚ 4. EXPERIMENT 


4.1 The Environment 
. All the experiments are performed on the same machine in 
which Microsoft Windows 2000 Advanced Server Version 5.0 
OS with Service Pack 4 is installed. The experiments are 
а performed on three different versions of Oracle that is Oracle 
8i Release 8.1.7.0.0, Oracle 9i Release 9.0.1.0.0 and Oracle 
, 10g Release 10.1.0.0.0 
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4.2 The Queries 

A table consisting of 11 columns of VARCHARA data type is 
used for the experiment. Each column can store a string of 
maximum 100 characters. The table is populated with 100000 
records. The following 8 queries are used in the experiments 
for observing the EXPLAIN PLAN and SQL Trace results. 
Experiments are performed using both without index and with 
index. 


SELECT Cl FROM T1 WHERE Cl LIKE 
'SVMPNODLHHCRWPEPS 





Table 1: List of SQL Statements used for Experiments 


5. COST COMPARISON AMONG ORACLE 81, 91 AND 
10g : v 


The costs of the queries are compared on the basis following 5 
parameters: 
CPU Time = CPU Time in seconds executing 
Elapsed Time = Elapsed Time in seconds executing 
Disk = Number of physical reads of buffers from 
disk 


Query = Number of buffers gotten for consistent 
read 

Current = Number of buffers gotten in current mode 
(usually for update) 


The various measured costs of above mentioned eight queries 
are represented in the Table 2 to Table 9 and Figure 3 to 10 
respectively. The overall measured cost of all eight queries is 
represented in Table 10 and Figure 11. 
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Note; * - denotes value of respective columnar is multiplied by 100 
"Table 2: Cost Comparison of Query 1 
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Figure 3: Cost Comparison of Query 1 
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Figure 5: Cost Comparison of Query 3 





Note: * - denotes value of respective column із multiplied by 100 


Table 3: Cost Comparison of Query 2 
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Table 5: Cost Comparison of Query 4 
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Figure 4: Cost Comparison of Query 2 


Copy Right C-BIJIT - 2011 Vol. 3 No. 2 ISSN 0973 - 5658 26 


i 


| Ап Empirical Evaluation of LIKE Operator in Oracle 































































































Copy Right 









| [^ c Em 
и 8 8 
Hj Б F 
H | [ 
o TE li ji TIE 
i HI 
EF 5 à ilis 
i їн | || Mi 
: i i 
з T i^ f. 
: " Hm is ИНН IE 
Í =i al =} 
ص‎ re | 
T НЕ: 
il f 
: > Palan 
Е "ji 
lis Ш 
HE i an 
ИЕ | 7 
ient | E E TE К ii 
GE $ a BE їз | PES { 





27 








© ВЇЛТ-2011 Vol. 3 No. 2 ISSN 0973 - 5658 








Table Ө: Con Compassion ot Quecy 8 














D Wabout index Or $) T Wii index On S D Wathout index 9n 3 
п Wilh index (in 90) и Without index fia 109) — With Inde (in 10g) 
TI Wih Bitmap index (n 90) D With Bitmap index (in 100) 








Figure 10: Cost Comparison of Query 8 
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Table 10: Comparison of Total Costs of All 8 Queries 
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Figure 11: Comparison of Total Costs of All 8 Queries 








6. SUMMARY AND CONCLUSIONS 

The study is being completed taking into consideration the 
different parameters which is been mentioned in the objective 
of my study. It includes both descriptive and empirical study 
and the conclusions drawn are having far reaching 
implications. On the basis of the study I had concluded: 


The cost of FIRST ROW (response time) and 
ALL ROWS (throughput) is same in all these three 
versions of Oracle. It might be due to in ALL. ROWS 


- Oracle computes the statistics based on block available in 


SGA. 
In case of without index table, exact match and pattern 


matching has same cost in all these three versions of ; 


Oracle. Because Oracle performs full table scan (compares 
with each row) if no index is available on the queried 
column(s). 

In Oracle 9i, the cost of evaluation plan is almost double 
of the cost in Oracle 8i. This cost in Oracle 9i might be 
decreased if the size of RAM is increased so that the disk 
swapping can be reduced. 

In Oracle 10g, estimated statistics and computed statistics 
shows the different results whereas in Oracle 8i and 
Oracle 9i both estimated and computed statistics shows 
the same results. It means Oracle 10g using different 


‘approach for estimating and computing statistics as 


compared to Oracle 8i and Oracle 9i. 

In Oracle 10g, the cost of execution plan is less than the 
Oracie 9i but slightly greater than that of Oracle 8i. It may 
be due to that Oracle 10g is managing the memory more 
efficiently as compared to the Oracle 9i but Oracle 10g 
has more overhead than that of Oracle 8i. 

In case of non-indexed table, CPU time and elapsed time 
in Oracle 8i is lesser than that of Oracle 9i. The Oracle 9i 
is basically the rewrite of the Oracle 8i with some more 
features which involve more overhead. 

In case of indexed table, CPU and elapsed time in Orecle 
9i is lesser than that of Oracle 8i. In Oracle 91, the indexes ~ 
are more optimized than that of Oracle 8i. 

CPU time in Oracle 10g is more than that of Oracle 8i and 
Oracle 9i but the elapsed time is inversed. Oracle 10g is 
having the grid support. It assumes that all the queries are 
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distributed queries therefore it requires more CPU time as 
compared to Oracle 8i and Oracle 9i. 

* In case of bitmapped indexed table in Oracle 9i, elapsed 
time is lesser than that of normal indexed table but the 

* — CPU time is inversed. 

e In case of bitmapped indexed table in Oracle 10g, CPU 
time is lesser than that of normal indexed table but the 
elapsed time is inversed. | 

e  Bitmapped indexes in Oracle 9i are more optimized than 
that of Oracle 10g. 


The above conclusions are based on hypothetical data. These 
may differ in actual environment with larger volume of data 
and/or different system configuration. The experiments are 
performed using 256 MB RAM, therefore if larger amount of 
RAM is used than the result may be slightly varied. 

Further, the results indicate that the query optimizer 
performance of Oracle 9i may only be due to a rewrite of the 
Oracle 8i query optimizer rather than any new algorithm's 
implementation (as the results do not differ by much). Some 
more features are added into the Oracle 9i. The indexes in 
Oracle 9i might be more optimized than that of Oracle 8i 
Oracle 10g has grid support which is beneficial in distributed 
environment. Due the grid support, a lot of overheads are 
included in the Oracle 10g. Oracle 10g may be assuming all 
queries to be distributed queries, therefore it requires more 
CPU time. From the observations of the experiments, it has 
“been noticed that the prefixed substring in LIKE operator 
requires ‘full table scan’ irrespective of whether index is used 
or not. Therefore, there is scope to improve the cost of LIKE 
‘%.....%° by design of new implementations for this operator. 
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ABSTRACT 

| Innovative activity underpins economic productivity and 
| growth. Countries that generate innovation, create new 
| technologies, and encourage adoption of these new 
| technologies grow faster than those that do not. In some 
industries patenting is identified as the most important means 
| Of protecting IP and is increasingly used as a strategic asset by 
companies to create sustainable competitive advantage — 
although, in others, secrecy is used to safeguard proprietary 
knowledge. The basic purpose of this paper is to see the impact 
' of patent filing on economic growth of the country leading to 
sustainable development of the economy. For this, the paper 
| analyzed and tested the data of 9 countries for the period of 10 
| years (2000-2009). The results concluded that it was a mixed 
| result in case of Asian countries. Only, technology based 
| 





' countries’ economies were affected by patent applications filed. 
1 
: KEYWORDS 
Intellectual Property Rights (IPR), economic growth, Gross 
Domestic Product (GDP), Asian countries, Patents, Sustainable 
| development 


| INTRODUCTION 
‘Intellectual Property rights (IPR) are legally enforceable rights 
| relating to creations of the mind and include inventions, literary 
and artistic works, and symbols, names, images, and designs 
| used in commerce. A number of individual rights are covered 
i by IP like Patents, trademarks, copyrights, designs and trade 
: secrets, [1] For sustainable development, economic growth of 
‘the country is very essential. Patent of new invention is one of 
‘the ways economic growths. The recent history seems to show 
that technology and knowledge are important factors for 
economic growth and development. Since the creation of the 
first mechanism to protect inventions in 15 * century, the patent 
system has.evolved with a view to promote innovation and 
i encouraging economic development. By offering exclusive 
' rights for a limited period, an inventor may recover R&D costs 
and investments [30]. A patent for an invention is granted by 
government to the inventor. When a patent is granted, the right 
| becomes the property of the inventor, which — like any other 
form of property or business asset — can be bought, sold, rented 
or hired. The patent is not a monopoly, but gives the inventor 
| the right — normally for 20 years from the date when the patent 
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application was first filed — to stop others from making, using 
or selling the invention without the permission of the inventor. 

Patent provides a great strength to the technology driven 
companies across the world and also helps in creating wealth to 
the economies of all developed, developing and least-developed 
countries. Many researchers revealed that there is a direct 
and/or proportionate relationship between Patent registration 
and economic growth of a country. This article will reveal such 
relationship between the country's percentage GDP (Gross 
Domestic Product) growth and the percentage change in the 
patent application filed among selected Asian countries for the 
period of 10 years (2000-2009) resulting in its sustainable 
development. 


REVIEW OF LITERATURE 

Intellectual property helped make possible the conditions for 
innovation, entrepreneurship and market-oriented economic 
growth that shaped the 20th Century. À critical enabling tool 
increasingly is intellectual property protection [31]. The 
contribution of technological innovation to national economic 
growth has been well established in the economic literature, 
both theoretically as well as empirically [27]. Many studies had 
evidently proved that there is a relationship between number of 
Patent application filed and economic growth of that country. 
Patent is a better performance variable but does also suffer from 
serious limitations. Patents can be expected to reflect conditions 
(red tape, financial sector quality, etc) that affect the decision to 
innovate [32]. Porter and Ketels argue that true competitiveness 
is measured by economic productivity — determined by capital 
intensity, labour force skills and total factor productivity — and 
productivity growth is influenced by trade, investment and 
innovative activity. They suggest that countries’ economies, in 
terms of their characteristic competitive advantage and modes 
of competing, evolve through various stages, namely, Factor- 
driven stage, Investment-driven stage and innovation-driven 
stage. All these stages are on tbe basis of their competitive 
advantage. [2] 

Another study [3] revealed that there is an evident relationship 
between Intellectual Property Rights (IPR) and sustainable 
development of the country. The author analysed the recent 
developments and indicated that there are an increasing number 
of links between intellectual property protection and sustainable 


development which need to be addressed. A number of studies ` 


have empirically demonstrated the ability of weaker IPRs in 
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'stimulating domestic innovative activity in developing 
‘ countries. In fact stronger IPRs may actually adversely affect 
‘innovative activity by stifling the absorption of knowledge 
spillovers that are important determinants of innovative 
activity. More and more researchers have endogenously 
determined by technical change resulting from decisions of 
profit-maximising agents. Some authors provide surveys of 
such innovation and R & D based endogenous growth models 
: [25] [26]. The OECD report on "Intellectual Property as an 
Economic Asset" [4], which draws on Kaplan and Norton [5], 
highlights the fundamental role IP plays in business 
‘performance and economic growth in knowledge-based 
° economies. The report points out that, increasingly, a large 
proportion of the market value of a company is determined by 
its intellectual assets — which, as intangible assets, have 
, monetary value and add to the company's balance sheet to 
increase enterprise value. Indeed, substantial value placed on 
' patents [6] and patenting innovations substantially increases (up 
| to 47%) the value realized from them. [7] 
, The most recent of these studies have expanded the analysis to 
‘include economic growth as measured by per capita output 
' (GDP). [28] An economic author developed an error correction 
‘ model to determine the equilibrium rate of entrepreneurship as 
а function of the stage of development of an economy. The idea 
` of the equilibrium rate has its roots in the choice between self 
‘employment and wage-employment that exists in the labour 
y market Also using data for 23 OECD countries, this study 
, derived the equilibrium rates of entrepreneurship and showed 
‘that deviations from these rates significantly and negatively 
| influence GDP growth. In a related area, [29] an author applied 
'this formulation to study the impact of small business 
| prevalence and reached a similar conclusion. Any country 
' deviating from the equilibrium rate of entrepreneurship incurs а 
' growth penalty in terms of foregone economic growth. In this 
!way, depending on whether a country’s actual rate of 
‘entrepreneurship is above or below its equilibrium rate, there is 
‘technically both a negative and positive relationship between 
. economic growth and the rate of entrepreneurship. 
‚ In an important contribution, [8] the authors compiled an index 
of patent rights for 60 countries between 1960-90. The GP 
'(Ginarte and Park) Index focused only on patent rights, as 
'published in law, with no attention to enforcement. 
· Nevertheless, the index has been widely applied in subsequent 
' studies as a measure of the strength of the national patent rights 
regime. The authors used the index to study the relation of 
economic growth, investment, and R&D expenditure to patent 
'rights. They found no relationship between stronger patent 
Tights and economic growth. However, among richer countries 
i (with above median income), stronger patent rights were 
‘positively related to investment and R&D. There was no such 
* relation among poorer countries. 


‘OBJECTIVE 

This article will discuss the relationship between two variables 
'— Patent application filed growth rate and GDP growth rate 
' among 9 selected Asian countries. The basic objectives are: 
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1. To find out the relationship between Patent 
applications filed growth rate and GDP growth rate. ' 
2. To identify the salient features of all the Asian 

countries which make them patent friendly or restrict 

them to compete with other Asian countries in terms of 

patent applications filed and economic growth. 
RESEARCH METHODOLOGY 
This article selected 9 Asian countries as a sample namely 
India, China, Japan, Indonesia, Brunei, Vietnam, Singapore, 
Malaysia, Thailand and Philippines. These countries were 
selected randomly out of all Asian countries. A correlation was 
set up for 10 years record of both patent application filed and 
GDP growth rate of all 9 Asisn countries. The article's 
hypothesis is that there is a direct relationship between tbe 
number of patent application filed and GDP growth rate. It 
means | 

H, = There is no relationship between Patent 
applications filed growth rate and GDP growth rate. 

Н, = There is/may have a direct relationship between 
Patent applications filed growth rate and GDP growth rate. 


For testing this hypotbesis, Student's T-test was used as it is 
one of the most appropriate correlation testing techniques for 
small sample. | 


Patent rights in different Asian countries | 
Regarding the present IP scenario in Asia, it has been quoted 
that almost every region in Asia Pacific bas at some point or 
other been accused of not providing adequate protection to IP 
rights. It is also a fact that most countries in Asia Pacific that 
have developed strong technological capabilities, including 
Korea, Taiwan, China and India, have built their capabilities on 
the basis of poor IP rights enforcement. [9] After this study, 
things had been changed variedly. Many changes took place in 
the laws and by laws of the countries world wide. Our sample 
countries also went through few changes which helped them in 
fostering their position in terms of secured patents to the world 
and hence increased the number of patents filed in the present _ 
time. This change had variedly impacted the economic 
conditions of those countries. 
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India | 
There is a well-established statutory, administrative and judicial 
framework to safeguard intellectual property rights in India, 
whether they relate to patents, trademarks, copyright or 
industrial designs. As far as patents are concerned 

recognition to patents was provided in 1856 by Bri 

goverumeat onthe basis of United Kingdom Act of 1852. Att 
many modifications in 1872, 1888, 1911 and 1949, in 1970, the 
first independent Act was passed by Joint Committee of Indian’ 


government, In 1999, another Patents (Amendment) Act, 1999 . 


passed by the Indian Parliament on December 20, 1999 tp 
amend the Patents Act of 1970 that provides for establishment 
of a mail box system to file patents and accords exclusive 
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marketing rights for 5 years. It was again amended in 1999 in 
the name of Patents (Second Amendment) Bill, 2002 to further 
amend the Patents Act, 1970 and make it TRIPS compliant. The 
third amendment was made in 2004 in the name of Patent 
amendment Ordinance, 2004 w.e.f. 1“ January, 2005. All these 
amendments made a great impact on the number of applications 
for the filing of patent applications. This can be seen in Table 
1A and Table 1B. 





Asian countries 





Table 1B: Number of patents applications filed among the 
Asian countries 


The irony is that the increase in patent application numbers did 
not affect much the growth rate of GDP of India. It was because 
of the reason that India is an agriculture based economy rather 
than technology based economy. The difference can be seen in 
Fig.l. ° 
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growth rate of India 
China 


Chinese history of patenting starts from 1985, when Ist Chinese 
patent law was framed. In 1992, after signing the Sino-US 
MOU (Memorandum of Understanding) on the protection of 
IPR, the Patent Law was reframed in a more protective manner. 
It was farther amended in 2000 creating a huge number of 
patents registered with China with a growth rate of 63%. Since 
then, year after year China had gone through many changes in 
IPR laws and the last amendment was made in 2009 including 
Utility Models and Design paténts in it. Right now, China is in 
a very strong position of technical advancement along with 
highest growth rate of GDP in Asia. The relationship of number 
of patent application filed and GDP growth can be seen with 


the help of Fig. 2. 
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Figure 2: Comparison of Patent application filed and GDP 
growth rate of China 


Japan 

The first Japanese patent Law was established in 1871 although 
it was abandoned with in a year. So, the proper functioning of 
Patent Law was known to be from April.18, 1885, when Patent 
Monopoly Act was enacted. In 1978, Japan acceded to the 
Patent Cooperation Treaty (PCT). In 1980, the JPO adopted the 
International Patent Classification, discarding its own patent 
classification. [10] In 2002, Japan Patent Office declared 
computer programs patentable. It is based on first to file basis 
Although Japan is tech savvy country but in last decade, there is _ 


a decline in terms of patent application filing. It is affecting the ` 
GDP growth also. It can be seen in Figure 3. 
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Figure 3: Comparison of Patent application filed and GDP 
growth rate of Japan 

Indonesia 

In Indonesia, patent law was first introduced in 1991. After the 
ratification of TRIPs (Trade Related Intellectual Property 
' Rights), amended patent law was introduced in 2001.despite of 
all amendments and membership of many conventions, 
application of patent law in Indonesia was not an easy job in 
. Indonesia. Its IP protection is still one of the weakest in world. 
‚ [11] as per Indonesian Patent Office, the number of patent 
‚ registered is very much varying from year after year. In 2009, it 
was 96 as compared to 21 in 2008. [12] In this country, patent 
applications do not make much impact on GDP growth rate. 
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Figure 4: Comparison of Patent application filed and GDP 
growth rate of Indonesia 


Singapore 

The Patents Act came into force on 23 Feb 1995 and provided 
Singapore with its own patent system. The Patents Act (Cap. 
221) and its subsidiary legislation, which consists of the Patents 
‘Rules, the Patents (Patent Agents) Rules, and the Patents 
(Composition of Offences) Regulations, form the legislation 
governing patent law in Singapore. [13] Whilst it is not 
mandatory to apply for patent protection in Singapore first 
before seeking patent protection overseas, any person resident 
in Singapore is required to obtain written authorization from the 
Registrar of Patents for an invention, before he files or causes 
. to be filed outside Singapore an application for а patent for that 
invention. It is one of the developed countries in WIPO list. 
Singapore is technology based country so both patent registered 
are highly correlated with each other. This can be seen with the 
help of Figure 5. 
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Figure 5: Comparison of Patent application filed and GDP 
growth rate of Singapore 


Thailand 

Thailand is a country where intellectual property has generated 
much controversy. In the late 1980s, the debate about 
controversial changes to the Copyright Act to strengtben the 
position of rights holders even led to dissolution of parliament 
and the calling of new elections. [14] The discussion 
subsequently shifted to patents and pharmaceuticals during the 
1990s. In view of the AIDS crisis in Thailand, the government 


negative impact on foreign investment. [15] The first Patent Act 
was formed in 1979. It was then amended in 1992 and then in 
1999. [16] Regarding patent applications, Thailand is getting 
quite a good number of patent applications year after year. It 
was10,561 in 2008 and 9730 in 2009. The GDP growth rate is 
also moving in almost same direction except in 2005 and 2006. 
This fact can be seen through Figure 6. 
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Figure 6: Comparison of Patent application filed and GDP 
growth rate of Thailand 


Vietnam , 
The protection of intellectual property rights was first 
introduced in Vietnam in 1981 by the promulgation of the 
Ordinance on Innovation and Invention 1981 (“Ordinance 
1981") [17] The Ordinance on the Protection of Industrial 
Property Rights enacted in 1989 ("Ordinance 1989") marked a 
turning point for the industrial property laws of Vietnam. [18] 
For the first time in the history of the country's IP protection, 
the concept of "industrial property" was introduced in a legal 
instrument. Ordinance 1989 provided the fundamentals for the 
protection of inventions, utility solutions, industrial designs, 
trademarks, and appellation of origin in the country. Most 
importantly, Ordinance 1989 specifically recognized patent 
rights as exclusive rights. It was then amended in the name of 
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| 


Civil code 1995. a proper Intellectual Property Rights Law was 

formed in 2005. [19] The number of patent applications and 

GDP growth rate are highly correlated in Vietnam and hence 

can be said that there is a impact of GDP growth rate on 

RAT 
7. 
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Figure 7: Comparison of Patent application filed and GDP 


pow rate of Vietnam 
Philippines 


Mhe Philippines is the country with the longest tradition of 
intellectual property protection in the region, reaching back to 
decrees introduced by the Spanish colonial power in the early 
19th century. [20]. After a period of IP protection via 
idential decrees during the Marcos regime, the Philippines 
as the first country in Southeast Asia to adopt a 
comprehensive intellectual property code following WIPO 
in 1995. The Code covers patents, utility models, trade 

and geographical indications, copyright, industrial 
designs, layout designs of integrated circuits and undisclosed 
information. It was then amended in 2006 and 2008. Regarding 
Pis number of patent application and GDP growth rate, 











is not a tech savvy country and hence there is no 
direct relationship between GDP growth rate and Number of 
t applications filed. It can also be seen in Figure 8. 

















| 

| Figure 8: Comparison of Patent application filed and GDP 
; growth rate of Philippines 

Malaysia 


The Malaysian Patent System generally originates from the 
United Kingdom Patent System. In 1983, the local system was 
introduced via the Patents Act 1983. Accordingly, a complete 
set of governmental mechanism was established and therefore 
allowing examination and subsequently registration of patents. 
[21] On May 16 2006, Malaysia became the 131st contracting 
state to the World Intellectual Property Organisation Patent 
Cooperation Treaty. The treaty was to enter into force in 
Malaysia on August 16 2006. Regarding the patent applications 
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Figure 9: “Comparison 2 of Patent application filed and GDP 
growth rate of Malaysia ! 


DATA ANALYSIS 

The data collected om different sources: Was analyzed to eee 
whether there exists a relation between country's GDP growth 
rate and Number of patent applications filed by domestic 
applicants and foreign applicants. It was then tested on 
hypothesis with 5 % level of significance. Student's T-test is 
used in it. 

On the basis of data collected it was discovered that it was а 
mixed expression of Asian countries regarding the filing of 
patent applications and its relationship with GDP growth rate of 
the respective country. Out of the sample of 9 countries, 5 
countries, namely, India, China, Indonesia, Philippines and 
Malaysia (having there t-value less than 1.86) were having no 
effect of number of patent applications filed over GDP growth 
rate and other 4 countries, namely, Singapore, Thailand, Japan, 
and Vietnam (having there t- value more than 1.86) have an 
impact of number of patent applications over GDP growth rate. 
This fact is clearer in Table 2. 





Table 2: T-test for GDP growth rate and number of patent 
applications filed relation for the period 2000-2009. 


І 
Also, three countries, namely, China, Indonesia and Malaysia 
were having negative correlation. It shows that there is a 
negative relationship between number of patent applications 
filed and GDP growth rate. The main reason behind this 
negative relationship is non-dependency or lesser dependency 
of GDP growth on number of patent filed. Also, in most of thd 
years, when Number of patent applications was more, there was 
a fall in the GDP growth rate and vice- versa. It shows there are 
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many other factors which are affecting the GDP growth rate 
‘than innovations and their registration as patents. 
| 


_, CONCLUSION AND RECOMMENDATIONS 


‘Many studies have explored the relationship between economic 


growth, competitiveness, innovation, IP and their sustainable 
development These studies have generally used R&D 
investment or the number of patents filed as proxies for 
innovation [22][23][24]. The article examined the correlation 
‘between patent applications filed and financial growth of 9 
‘selected countries of Asia. This study has considered only one 


: variable for studying the financial effect of patent applications 
' filed on economy of the country і.е. GDP growth rate. Out-of 


| 
| 


1 





‘the data collected, it was discovered that half of the selected 


Asian countries were not having any concerns with number of 
‘patent applications filed. They have other GDP growth 
, affecting factors like, agriculture, service industry, assembling 
, of new technology from outside, etc. 

‘With the help of literature review in this study, it can also be 
; concluded that in few countries like Singapore, Philippines, the 
| IPR regime is likely to affect growth indirectly by encouraging 
‘the innovative activity that in turn is the source of total factor 
‘productivity improvement leading to the overall development 
of the country. 


. The countries having positive correlation (namely, Singapore, 
"Thailand, Japan, Vietnam) depicts, leaving all other factors of 
_ affecting GDP, innovations are the major factor affecting GDP 


: growth rate. 
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ABSTRACT 

Due to the rapid growth of the Web from a few thousand pages 
in 2000 to its current size of several billion pages, users 
increasingly depend on web search engines for locating 
relevant information. One of the main challenges for search 
engines is to provide a good ranking function that can identify 
the most useful results from among the many relevant pages, 
and a lot of research has focused on how to improve ranking, 
We present an effective caching scheme that reduces the 
computing and I/O requirements of a Web search engine 
without altering its ranking characteristics. The novelty is a 
two-level caching scheme that simultaneously combines 
cached query results and cached inverted lists on a real case 
search engine. A set of log queries are used to measure and 
compare the performance and the scalability of the search 
engine with no cache, with the cache for query results, with the 
cache for inverted lists, and with the two-level cache. 
Experimental results show that the two-level cache is superior, 
and that it allows increasing the maximum number of queries 
‘processed per second by a factor of three, while preserving the 
response time. 


. KEYWORDS: Search Engines, Query Processing, Retrieval, 
Ranking, Cache Design 


1. INTRODUCTION 

Large web search engines have to answer thousands of queries 
per second with interactive response times. Due to the sizes of 
the data sets involved, often in the range of multiple terabytes, 
a single query may require the processing of hundreds of 
“Megabytes or more of index data. To keep up with this 
.immense workload, large search engines employ clusters of 
.hundreds or thousands of machines, and a number of 
techniques such as caching, index compression, and index and 
query pruning are used to improve scalability. In particular, 
two-level caching techniques cache results of repeated 
identical queries at the frontend, while index data for 
frequently used query terms are cached in each node at a lower 
level. Popular search engines receive millions of queries daily, 
a load never experienced before by any IR system. 
Additionally, search engines have to deal with a growing 
number of Web pages to discover, to index and to retrieve, and 
must handle very large databases. To compound the problem, 
search engine users want to experience small response times as 
well as precise and relevant results for their queries. In this 


scenario, the development of techniques to improve the 
performance and the scalability of search engines without 
degrading the quality of the results becomes a fundamental 
topic of research in IR. One effective alternative for improving 
performance and scalability of information systems is caching. 
The effectiveness of caching strategies depends on some key 
aspects, such as the presence of reference locality in the access 
stream and the frequency at which the database being cached is 


updated. 

In this paper we describe and evaluate the implementation of 
caching schemes that improve the scalability of search engines 
without altering their ranking characteristics. The starting point 
of the work is TodoBR, a state-of-the-art full scale operational 
search engine that crawls the Brazilian Web. We enhanced the 
curent implementation of TodoBR by integrating three 
caching schemes. The first one implements a cache of query 
results, allowing the search engine to answer recently repeated 
queries at a very low cost, since it is not necessary to process 
those queries, The second one implement a cache of the 
inverted lists of query terms, thus improving the query 
processing time for the new queries that include at least one 
term whose list is cached. The third caching scheme combines 
the two previous approaches and will be called two-level 
cache 


Each of the first two strategies presents advantages and 
disadvantages. A hit in the cache of query results avoids query 
processing, while a hit in the cache of inverted lists reduces the 
amount of I/O associated with answering a query, but does not 
avoid the query processing costs. On the other hand, the hit 
ratio associated with inverted lists is usually higher than the hit 
ratio for whole queries, which may pay o the query 
processing cost. The motivation behind the third strategy is to 
exploit the advantages of the first two strategies to improve 
even further the overall performance and scalability of the 
scarch engines. 

Our imental evaluation yields some key results. The two- 
level cache is superior and allows increasing the maximum 
throughput by a factor of three, relative to an implementation 
with no cache. Furthermore, the throughput of the two-level 
cache is up to 5296 higher than the implementation using just 
cache of inverted lists and up to 36% higher than the cache of 
query results. Our work is distinct from previous ones because 
it presents experimental results on the effectiveness of different 
caching strategies implemented on a real case search engine. 
Our main contribution is the two-level cacbing scheme we 
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.proposed which yields superior performance. Our results can 
be replicated to other Web search engines since there is high 
similarity between workload characteristics present in the logs 
of TodoBR search engine and in the logs of other large search 
engines. 


'2. SEARCH ENGINE ARCHITECTURE 

` Web search engines are IR systems that take а query as input 
| апа produce as a result a set of links to relevant Web pages 
‘related to the query. Search engines seek, collect and index 
‚ Web pages on a massive scale. To speed up query processing, 
‚а queries are answered using the index and without accessing 
the text directly. 

Efficient query evaluation requires specialized index 
techniques when the text collection is large. Our search engine 
' server implementation uses an inverted file as index structure, 
‚а popular choice to implement large scale IR systems. An 
| inverted file is typically composed of a vocabulary, which 
. contains the set of all distinct terms in the collection, and an 
‘inverted list for each term of the vocabulary. The inverted list 
‚ОЁ a term t is a list of the identifiers of the documents 
containing t with the respective frequency of occurrences of t 
on each document. 

' The ranking method used for the experiments is based on the 
vector space model. In the vector space model, the documents 
‘and the queries are represented as vectors in a space with 
' dimensions given by the size of the vocabulary. The answers to 
the queries are the documents with the highest similarity 
values, where the similarity is computed by the cosine of the 
angle between the query vector and each document vector. The 
inverted file is used during query processing time to compute 
the similarities of each document of the collection against the 
query. 

' For large document databases, the cost of evaluating the cosine 
measure may be potentially high, because it assigns a 
similarity measure to every document containing any of the 
query terms, requiring a read and some processing on the 
whole inverted list of each term of the query. This task may be 
expensive since some of the terms can occur in a high 
proportion of the documents present in the database. 

An effective technique to compute an approximation of the 
cosine measure without significant changes in the final ranking 
for each query is already proposed. We use it to process the 
' queries submitted to the search engine server. This query 
evaluation technique uses early recognition of which 
, documents are likely to be highly ranked to reduce costs of 
, query processing. Queries are evaluated in 2% of the memory 
‘of the standard cosine implementation without degradation in 
' retrieval effectiveness. Disk traffic and CPU time are also 
reduced because the algorithm processes only portions of the 
inverted lists" which have information that can change the 
ranking. 


3. CACHE DESIGN 
In this section, we describe in detail the strategies for 
implementing the three caches in a search engine, that is, 
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caching of query results, caching of inverted lists, and a two- 
level cache that combines both. 


Cache of Query Results 

Our strategy for caching query results is to keep in memory the 
list of documents associated with a given query. For each 
document we store its URL, its title, and a 250 character 
abstract. The very first implementation issue of this caching 
strategy is determining the number of document references that 
should be cached for each query. It is remarkable that the 
number of documents that match a given query is often huge. 
However, the great majority of the users request at most the 
first 30 references that match a query. In TodoBR we also 
observe the same behavior, since most of the users (7096) do 
not request more than 10 references, and 9096 of the query 
requests are for at most the first 50 references. Thus, we 
limited our cache of query results to 50 references, resulting in 
a storage requirement of 25 kilobytes per query result cached. 
This implementation decision allows our cache to satisfy most 
of the queries without wasting memory, and also exploits the 
spatial locality among queries. Figure 1 (a) shows the 
architecture of the search engine including the cache of query 
results. Whenever a user submits a query to the search engine, 
it checks whether the cache is storing the associated query 
results and the reference rank is below the caching threshold, 
in our case 50. If there is a cache hit, the query result is 
immediately returned to the user, at a very low cost, since the 
response only needs to be formatted and sent to the user, a cost 
inherent to any query. Otherwise, the search engine processes 
the query normally, occasionally caching it, wbenever the 
reference rank is below the threshold. 

The second major issue is the replacement policy for the query 
results, that is, how we determine which query results should 
be evicted from the cache whenever a new set of results is to 
be cached and the cache is full. In this first implementation we 
adopted LRU (least recently used) as replacement policy, since 
the TodoBR logs present a good temporal locality. Markatos 
has proposed .altemative cache replacement policies for 
caching query results, such as SLRU (segmented LRU) and 
FBR (frequencv based replacement), but they did not improve 
the cache hit ratio significantly. Furthermore, Markatos did not 
exploit spatial locality in his work, in the sense that a query 
result for the first ten documents is handled independently 
from the result for the next ten documents of the same query. 





Figure 1(b): Inverted Lists 
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Figure 1(c): Two Levels 


, Cache of Inverted Lists 
' Our strategy for caching inverted lists is to keep in memory the 
` list of Web documents associated with a given query term. In 
‚ practice, our enhanced search engine caches the inverted lists 
' for each term as they are accessed, and uses these lists to 
Trot ce ege ик асре ee In this case, 
the integration with the search engine is straightforward, since 
i aots ав specialized buffer for the index, which is usually 
! Stored in secondary memory. The main motivation for caching 
: inverted lists is the good reference locality that is usually 
i observed among individual search terms. Since the term 
! locality is even greater than the query locality, and thus may 
; Attain a higher cache hit ratio, caching inverted list4 is a good 
: strategy for improving the scalability of search engines. The 
! implementation of caches of inverted lists has to face two 
| issues related to the high variance in the size of the inverted 
‚ lists: the size of the cached lists and the internal organization of 
the cache. 
; These issues are discussed in the remaining of this section. The 
' gize of the inverted lists is a function of both the term 
' popularity in the collection and the number of documents being 
| indexed. For large collections, these lists may also become 
| very large, making ,cache of inverted lists to fail in practice, 
| since they require considerable cache space to store the whole 
list. To address this problem, we turn to an important 
‚ Characteristic of the filtered vector model processing 
' technique. In this technique, the inverted lists are sorted by the 
; frequency of occurrence of the term in each document, and the 
, query processing exploits the frequency variance by using just 
| the documents in which the term is most frequent. As a 
consequence, the lists are not fully traversed or are not 
‚ traversed at all, depending on the relevance of the term on the 
; collection and on the query it. In summary, the vector model 
' allows naturally handling the problem associated with large 
' inverted lists. 
. Since lists are almost always partially processed, we set out to 
, cache parts of lists. The frequency-sorted inverted lists can be 
| Partitioned in different ways. The lists are naturally divided 
into blocks of documents in which the term appears with the 
; same frequency, and these are the smallest units of algorithm 
. processing. These blocks present interesting properties 
| regarding their size and access pattern. The first blocks of cach 
' fist are small, consisting of few documents, and are much more 
frequently accessed than the blocks at the end of the lists, 





' 
1 
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which contain the documents in which the term appears a few 
times, In the model, given an inverted list of a term t, for some 
integer v (usually 2 to 4), a fraction (v - 1Yv of the document 
identifiers have frequency 1 (fa, = 1); of the remainder a 
fraction (v - 1Vv have fd.t = 2, and so on. If v is 2, for example, 
half of the list will correspond to the block of documents in 
which the term appears only once. Blocks could be tbe objects 
to be cached, but their size distribution spans several orders of 
magnitude, making caching much more complex. Since the 
objects cached by a Web cache (html files, images, etc), also 
present extremely high variable sizes. 

Using blocks as cacheable objects presents some advantages, 
but requires prefetching strategies and specific admission and 
replacement policies. For example, the first blocks of the lists 
tend to be very small and are generally accessed together. If no 
prefetching is done when the first block of a list is requested by 
the cache to the disk, there is a large number of disk seek 
operations to retrieve several small objects. 

Another issue arises when the cache requests the last block of 
some large list. This is likely to be a large block, and its 
admission into the cache could cause the eviction of several 
other smaller but much more accessed blocks. These 
mechanisms and policies are certainly worthy of further study, 
but in this work we conjecture that much of the advantages of 
caching blocks can be attained by using a simpler alternative 
approach, namely to "page" the lists, i.e., to divide them into 
equally sized pages. We should observe that, based on the 
aforementioned distribution of sizes of blocks, the first pages 
of an inverted list may contain several blocks, while the last 
blocks of the list may span several pages. In this work we 
employed a page of 4 kilobytes which is also the disk block 
size. In our implementation, the cache only has knowledge of 
pages, and this makes for much simpler cache design. 
Furthermore, by varying the size of the pages, we can balance 
the tradeoff between the number of seek operations and the 
volume of bytes transferred from the disk. At one extreme, in 
which each byte of the inverted list is considered to be a page, 
there will be at least as many misses in thé cache as the amount 
of bytes needed to answer a given workload of queries. The 
number of seek operations is maximal, while the volume of 
bytes transferred is minimal. 

At the other extreme we consider a large page size, such that 
each list requires at most one miss in the cache. In this case, 
the number of seck operations is minimal, but the volume of 
bytes transferred is much larger than w iat is needed to answer 
the queries. Large pages have an amortizing effect on the disk 
seek time, and implicitly exploit spatial locality among list 
blocks, but may, on the other hand, cause the cache to store 
irrelevant parts of lists. Depending on the combination of 
factors, such as the costs associated with a disk seck operation 
and with the transferring of a byte, one can find an optimal 
page size. Other factors that should be taken into consideration 
are the disk block size and some operating system cache in 
effect. Figure 1 (b) illustrates the architecture of a search 
engine that embeds the cache of inverted lists. The query is 
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request to read a block, which is mapped to a page, from the 
inverted list, when the cache is checked. The disk is accessed 
only in the case of a miss in the cache of inverted lists. Again, 
we employed LRU as replacement policy. Although the cache 
of inverted lists avoids disk accesses, every query submitted to 
the system must still be processed, and gains in performance 
depend on the computational platform where the search engine 
runs. 


Two-Level Cache 

As discussed in the previous sections, each of the two cache 
architectures presents advantages and disadvantages. The 
cache of query results avoids processing queries which are 
already in the cache, while a hit in the cache of inverted lists 
only avoids disk accesses. On the other hand, the hit ratios 
obtained for the query results are smaller than the hit ratios 
obtained by the cache of inverted lists. These observations led 
us propose and test a third cache option, which combines the 
two caching strategies. We call this option two-level cache. 
Figure 1 (c) shows the architecture of the search engine with a 
two-level cache system. Each request for the search engine is 
checked first in the cache of query results. If it is a hit, the 
query is answered immediately, otherwise the query is 
processed and the cache of inverted lists is used to reduce the 
number of disk accesses. 

e The Li cache receives addresses from the prefetch 
and returns instructions either from the cache or from 
the next level of the memory hierarchy. The cache 
also receives addresses from the execution unit and 
reads or writes operands, again from the cache or 
from the next level of the hierarchy. The handling of 
writes varies with different write algorithms. If 
separate L] instruction and data caches are present, 
they respond to the instruction fetch and instruction 
execution units, respectively. 

е The L2 cache receives addresses from the LI cache (or 
caches) and reads or writes operands from its storage 
or from the primary memory system. The handling of 
writes varies with different write algorithms. 





Basic Two-Level Simulation Model 
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* The bus is a half-duplex data path connecting the 
caches to the memory system. Devices on the bus 
must arbitrate for bus ownership before commands or 
data can be sent. 

e The primary memory consists of a number of 
interleaved memories. Simulation parameters include 
the interleaving factor, access time, and cycle time of 
main memory. 


4. WORKLOAD CHARACTERIZATION 

In order to assess the behavior of the three cache 
implementations we consider in this paper, we perform an 
analysis of a partial log of queries submitted to TodoBR, 
comprising 100,256 queries. There is a total of 37,450 unique 
queries, and 23,751 unique terms in the log. We focus on 
aspects relevant to both levels of caching we consider, namely 
the characteristics of the stream of queries present in the log 
relevant to the cache of query results and of the stream of page 
references generated by the query processor - influencing the 
behavior of the cache of inverted lists. 

In the case of the cache of inverted lists, we study its behavior 
under two different workloads, the first one with all the 
queries, and the second one with only the unique queries. To 
understand the reasons for this consideration, let us examine 
what happens to the cache of inverted lists under different 
configurations of the cache of query results. When used stand 
alone, the cache of inverted lists receives from the query 
processor a page workload originated from all of the queries 
received by the search engine. This is precisely the workload 
represented by the ~All Queries' workload. 

On the other hand, suppose a two-level implementation in 
which the cache of query results'is large enough not to have 
any miss caused by eviction from the cache, i.e., it can store 
the results of every query that it receives. In this situation, the 
query processor, and thus the cache of inverted lists, will only 
process the unique queries, for all the repetitions will be 
handled by the cache of query results. The workload the cache 
of inverted lists will be subject to is well represented by the 
"Unique Queries' workload. There will be a smooth transition 
from one workload to the other for varying sizes of the cache 
of query results, meaning that we can have valuable insight of 
the performance of the cache of inverted lists for a wide range 
of situations. À very small cache of query results will generate 
a workload at the cache of inverted lists similar to the ~All 
Queries' workload, while a large cache of query results will 
generate a workload close to the "Unique Queries' workload. ' 
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We start our workload characteriration by analyzing the 
popularity of both queries and pages of the cache of inverted 


objects sorted by decreasing popularity, that is, the most 
popular object is the first in the rank. 

For a reference stream to order good opportunity for caching, it 
ought to exhibit temporal locality among its references. In fact, 
the authors conclude that popularity is the main source of 
locality, specially in dealing with reasonably sized caches, and 
that a reference stream whose objects popularity follow a Zipf- 
like distribution exhibit a high degree of temporal locality. 
Zipfs law relates the popularity rank p of an object, to the 
probability P that it is requested, by P ~ 1/p, and has been 
applied to several distinct contexts, such as words in natural 
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language and accesses to web pages. We call a Zipf-like 
popularity distribution the one in which the relation between P 
and p is given by P ~ 1/p*. This is a generalization of Zipf's law 
and in a log-log plot of popularity versus rank appears as a 
straight line with slope - a. The smaller a is, the less skewed 
the distribution is, showing weaker temporal locality and worse 
cache ability. 
We verified that the references to queries follow a Zipf-like 
distribution. In Figure 2 we plot the relative popularity, i.e., the 
probability of accessing each query, versus the popularity rank 
for the queries: stream, together with a Zipf-like distribution 
with an a parameter of 0:59, obtained by a least-squares fitting 
of the data. 
In Figure 3 we examine the popularity distribution for both 
workloads of the cache of inverted lists. We can notice a pair 
similar curves, labeled ~All Queries’ and "Unique Queries’. 
are two regions in these two curves, one up to roughly 
the rank 2,500, with large at segments, and one after this point, 
which is approximately an straight line in the log-log plot with 
inclination of -1. The flat region occurs due to the page access 
pattern. The first pages of each list are accessed in group, 
meaning that they should have approximately the same 
probability of being accessed. This suggests, for caching 
effects, that the pages making up at region should necessarily 
be stored in the cache if it is to have a good level of efficiency. 
The second region, which comprises more than 9096 of the 
pages, exhibit a Zipf-like behavior, and is well fit by one such 
distribution with a — 1. This indicates that the distributions 
much more skewed than that of the queries’ popularities, 
resulting in greater temporal locality. 
The distribution does not vary much for both workloads, 
meaning that there is opportunity for caching inverted lists 
even if this caching is to be done after a fully efficient first 
level cache of query results. In order to further investigate this 
opportunity, we collected statistics of the number of distinct 
ies in which each term appears. In the situation of a fully 
ive cache of query results, resulting in the "Unique 
Queries' workload to the cache of inverted lists, the terms that 
appear in only one query shall not generate a hit, because their 
pages will only be seen once by the cache of inverted lists. We 
found out that approximately 40% of tbe terms appear in more 
than one query, evidencing the extra locality that can be 
exploited by the cache of inverted lists. 


Cache Miss Ratios 

To assess the behavior of a cache under a LRU replacement 
policy, we generated the successive stack distances from the 
log. The marginal distribution of stack distances can be used to 
determine the miss ratio for a cache at different sizes. Let D be 
the random variable corresponding to stack distance, and let 
FD be tbe cumulative distribution function for D. The miss 
ratio m(x) for a cache holding x objects is given by 

P [D >x] = 1 - F a (x) = m(x) 

The first observation from the graph is the minimum miss ratio 
we can obtain under this query workload, which is around 
40%. This is the miss ratio that an infinite cache would exhibit, 
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and is due to the first occurrence of each query. The most 
important fact the graph shows is how fast the miss ratio 
decreases as we increase the capacity of the cache, relative to 
the TodoBR log we considered. We can observe a 'knee' in the 
curve close to 10 megabytes, indicating that a relatively small 
fraction of the queries accounts for a significant portion of the 
accesses. 

This is a good indicator of the cache size that offers a good 
compromise between space and hit ratio. After this point, small 
decreases in the miss ratio come at the expense of large 
increases in cache size. It is with these considerations that we 
choose, for the following experiments, a cache size of 20 
megabytes for query results. We point out that the fact that a 
cache of this size 

holds most of the working set of the workload is much more 
important than the size itself, which should be determined in a 
case by case basis, by analyzing the miss ratio curve for the 
workload. 

We can see similar miss ratio versus cache size curves for the 
cache of inverted lists under the two workloads considered. 
One can notice that the cache size at which there is a 
significant decrease in the miss ratio is much larger than in the 
case of the cache of query results, suggesting that the working 
set of the pages requires more cache space. 

However the asymptotic miss ratio observed is much lower in 
the case of the cache of inverted lists, even for the "Unique 
Quezies' workload. This shows the greater temporal locality 
present in the reference to pages, as was inferred from the 
popularity distributions. The miss ratio of the ~All Queries’ 
workload is considerably lower than the one of the "Unique 
Queries! workload, because in the latter only the repetition of 
terms across different queries do cause hits at the cache. Still, a 
250 megabytes cache of inverted lists subject to the "Unique 
Queries' workload, i.e., the worst case workload for the second 
level cache, can achieve hit ratios of 8096 on top of the misses 
at the first level. 

We have a final word on tbe scalability of the characteristics 
presented herein. As we increase the length of the request 
stream submitted to the cache, the popularity distribution of 
queries and thus the marginal distribution of stack distances 
tend not to change much, meaning that a relatively small cache 
size should still be effective. Furthermore, the miss ratio tends 
to decrease as we increase the length of the request stream. 


5. EXPERIMENTAL RESULTS 

We present in this section experimental results that show the 
practical impact of the three caching schemes discussed on the 
scalability and on the average response time of the search 
engine as a whole. 

The experimental environment comprises two machines 
running Linux operating system version 2.2.16. The search 
engine runs on a Pentium III 550 MHz machine with 512 
megabytes of main memory, and a 36 gigabytes SCSI disk. 
The client runs on a AMD K6 450 MHz machine with 256 
megabytes of main memory. The two machines are connected 
directly by an 100-megabit fast Ethernet. 
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We employ the software Httperf to read a log of 100,256 
queries submitted to TodoBR and to generate workload to the 
various server implementations at controlled rates. It measures 
the performance of the server from a client perspective, 


reporting, among other information, the average response time 7 


for the client to receive an answer, the throughput of the server, 
The overall amount of server main memory used for the 
various cache implementations was set to 270 megabytes, 
based on the results presented in Section 5. In the two-levet 
cache the memory was divided into two partitions: 20 
megabytes for caching query results and 250 megabytes for 
caching inverted lists. A cache of 270 megabytes shows to be 
enough to achieve good performance in all cache schemes 
кше in this Work and accounts Tot oniy- 6.370 арш ог 
index size of TodoBR. 


Cache of 
Кезш —— 


Tablet 


Table 1 shows the counts for submitted queries and inverted 
list pages retrieved from disk, as an indication of CPU and disk 
demands for the four implementations. We can observe that 
caching query results reduces significantly (up to 62%) the 
number of queries that need to be processed. 

On the other hand, caching inverted lists reduces the number of 
page reads by an order of magnitude. The two-level cache 
shows to be a good compromise in terms of performance, since 
it gets close to the best results, that is, the number of queries 
processed increases by only 21%, and the number of pages 
retrieved increases by only 3%. 

At low request rates, the best performance was achieved by the 
cache of query results, which presents the lowest processing 
costs, closely followed by the two-level implementation, while 
the cache of mverted lists gives response times close to the 
implementation with no cache. This result is explained by the 
overhead associated with handling inverted lists and the gains 
inherent to the file system cache provided by the Linux 
operating system, which reduces the time to read a disk page. ' 
At higher request rates the disk throughput saturates and the 
cache of inverted lists effectively improves the engine 
performance when compared to the implementation with no 
cache. The differences in the amount of disk operations also 
explain the better scalability of the two-level cache. As shown 
in Table 1, the two-level cache presented a miss ratio in terms 
of query results close to the miss ratio of the cache of query. 
results. On the other hand, the total number of disk reads in the 
two-level cache was only 20% of the total number of reads 
performed when caching only query results. 
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An immediate consequence of the better performance provided 
by the two-level cache is a better overall throughput. The 
maximum throughput obtained by the two-level cache is 64 
queries per second, while the maximum for the system with no 
cache was 22 queries per second. For the cache of inverted 
lists, the maximum throughput was 42 queries per second. For 
the cache of query results, the maximum throughput was 47 
queries per second. 


6. CONCLUSIONS 

In this paper, we have proposed and evaluate experimentally a 
new multi-leve] caching architecture and scheme for web 
search engines that can improve query throughput and improve 
search engine scalability without modifying tbe ranking of 
query results. We have implemented and evaluated three 
different caching schemes on the search engine TodoBR, and 
compared the performance of these implementations to the 
original engine with no cache. The experiments show that the 
two-level cache provides the maximum throughput among all 
implementations, and that it is superior to the implementation 
with no cache by a factor of three. Furthermore, the throughput 
of the two-level cache is up to 52% higher than the 
implementation using just inverted lists and up to 36% higher 
than tbe cache of query results. The analysis of the TodoBR 
logs indicates that the miss ratios of both caches tend to 
decrease as we consider larger request streams. We are also 
interested in studying the impact of caching in search engines 
7 which are based on other ranking algorithms, such as ranking 
based on link analysis. The changes in the ranking algorithm 
can affect the cache system because the access pattern for the 
inverted lists may change and extra information may have to 
be retrieved from other index structures apart from the inverted 
lists. To our knowledge there is no published work on how to 
apply pruning to such types of ranking functions, which are not 
based on a simple combination of the scores for different 
terms. 
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ABSTRACT 

| With. increase in integration density and complexity of the 
system-on-Chip (SOC), the conventional interconnects are not 
suitable to fulfill the demands. The application of traditional 
network technologies in the form of Network-on-Chip is a 
potential solution. NoC design space has many variables. 
Selection of a better topology results in lesser complexities and 
better power-efficiency. In the proposed work, key research 
area in Network-on-chip design targeting communication 
Infrastructure specially focusing on optimized topology design 
is worked upon. The simulation is modeled using a 
conventional network simulator tool packet tracer 5.3, in which 
. by selecting proposed Topology 35.7 % reduction in traversing 
the longest path is observed. 


KEYWORDS 
NoC, SoC, Routing, Mesh, packet tracer 


'INTRODUCTION 

Recent technological development in the field of integrated 
‘circuits has enabled designers to accommodate billions of 
transistors. The level of integration has enhanced 
computational power enormously. The exponential decrease in 
the feature size has enabled integration of heterogeneous IP 
cores on a single chip leading to a new era of integration 
circuits known as System-on-Chip. However, as the number of 
components and their performance continue to increase, the 
design of power, area and performance efficient 
communication infrastructure is gaining equal importance. The 
traditional methods of connecting these heterogeneous IP Cares 
are not meeting the demands of these very complex structures. 
‘Furthermore, with technology scaling, traditional global 
‘interconnects cause problems like synchronization errors, 
‘unpredictable delays and high power consumption. [1] 
Traditional bus and crossbar based methods to communication 
become very inefficient, resulting in massive numbers of wires, 
failed timing closure, increased heat and power consumption, 
and routing congestion leading to increased die area. The 
Network-on-Chip approach promises the alternative to 
‘traditional bus-based and point-to-point communication 
‘structures. The' networking methods have been dealing with 
same kind of problems on traditional computer networks. It 
indicates that NoC designers can borrow the concept of 
conventional computer networking with necessary 
customization to suit demands of SoCs. 





The SoCs consists of heterogeneous [P-Cores such as Video 
processors, Image processors, memory blocks etc. Each of 
these cores is connected to NoC through a network interface or 
network adapter module. The NoCs contain a network of 
routers responsible for end to end delivery of the packets from 
IP-cores. The communication demands of these IP-cores vary 
depending on the application running on it. The nétwork 
interface provides seamless integration of theses IP-Cores and 
network. Locating the interconnect logic closest to each IP 
block results in fewer gates, fewer and shorter wires, and a 
more compact chip floor plan. Having the option to configure 
each connection's width, and each transaction's dynamic 
priority assures meeting latency and bandwidth requirements. 
The routers are connected to each other through links. The 
origin of NoC has also been viewed as a paradigm shift from 
computation centric to communication-centric design as well as 
the implementation of scalable communication structures. The 
modular architecture of NoC makes chip structure highly 
scalable and well controlled electric parameters of the modular 
block improve reliability. 

As the network communication latency depends on the 
characteristics of the target application, computational elements 
and network characteristics (e.g. network bandwidth and buffer 
size [2]. First of all the target applications and their associated 
traffic patterns and bandwidth requirements for each node ih 
the network is determined. This application partitioning and 
knowledge of overall system architecture significantly impact 
the network traffic and helps determine the optimal network 
topology. Optimal network topology creates immense impact 
of design cost, power and performance and helps designers to 
choose effective and efficient routing algorithms’ and flow 
control scheme to manage incoming traffic. 

The design space of a NoC is very large, and includes topology 
choice (mesh, torus, star, etc.), circuit switched or packet 
switched, and other parameters (link widths, frequency, etc.). 
Because the traffic patterns of most SoCs сап Бе known, a 
custom generated network topology and physical placement of 
components yields better performance and power ‘than a 
regular-pattern network [4]. A NoC’s buffers and links ‘can 
consume near 75% of the total NoC power [5], thus there is 
significant benefit to optimizing buffer size, link length and 
bandwidth of a NoC design. 

Generally speaking, determining the optimal topology to 
implement any given application does not have a known 
theoretical solution. Although the synthesis of customized 
architectures is desirable for improved performance, power 
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consumption and reduced area, altering the regular grid-like 
structure brings into the picture significant implementation 
issues, such as floor planning, uneven wire lengths (hence, 
poorly controlled electrical parameters), etc. Consequently, 
ways to determine efficient topologies that trade-off high-level 
performance issues against detailed implementation constraints 
at micro- or nano-scale level need to be developed. 


BACKGROUND 
 Network'on-Chip (NoC) is an emerging paradigm using packet 
switched ‘networks for communications within large VLSI 
systeii-on-Chip. NoCs are poised to provide enhanced 
perforiánce, scalability, modularity, and design productivity 
as compared with previous communication architectures such 
as busses and dedicated signal wires. With the emergence of 
large number of cores in general purpose and system-on-chip 
(SoC), NoCs are likely to be prevailing on-chip interconnect 
fabric. [6] 
The early work and basic principles of NoC paradigm were 
outlined in various seminal articles, for example [7-17] and few 
text books [18-20]. However, the aforementioned sources do 
not present many implementation examples or conclusions. ` 
Networking concepts from the domains of telecommunication 
and parallel computer do not apply directly on chip. From a 
, networking perspective, they require adaptation because of the 
. unique nature of VLSI constraints and cost e.g. area and power 
minimization are essential; buffer space in on-chip switches are 
limited, latency is very important, etc. At the same time, there 
are new degrees of freedom available to the network designer, 
such as the ability to modify the placement of network 
endpoints. From the view point of VLSI designer, many well 
understood problems in the real aim of chip development 
methodology get a new slant when they are formulated for a 
` NoC based system, a new trade-offs need to be 
Therefore, the field offer opportunities for noble solutions in 
network ‘engineering as well as system architecture, circuit 
techndlogy, and design automation. [6] 
Current complex on-chip systems are also modular, but most 
often the modules are interconnected by an on-chip bus. The 
bus isd communication solution inherited from the design of 
_largé board- or rack-systems in the 1990's. It has been adapted 
‚ to thé SoC specifics and currently several widely adopted on- 
chip bis specifications are available [31-34]. 
While the bus facilitates modularity by defining a standard 
interfacé, ‘it has major disadvantages. Firstly, a bus does not 
‘structure the global wires and does not keep them short. Bus 
Wifes may span the entire chip area and to meet constraints like 
area and speed the bus layout has to be customized [35]. Long 
wires also make buses inefficient from an energy point of view 
[36]. Secondly, a bus offers poor scalability. Increasing the 
‚ number of modules on-chip only increases the communication 
demands, but the bus bandwidth stays the same. Therefore, as 
the systems grow in size with the technology, the bus will 
become a system bottleneck because of its limited bandwidth. 
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Recently, network-on-chip (NoC) architectures are emerging as 
a candidate for the highly scalable, reliable, and modular on- 
chip communication infrastructure platform [11]. The NoC 
architecture uses layered protocols and packet-switched 
networks which consist of on-chip routers, links, and network 
interfaces on a predefined topology. There have been many 
architectural and theoretical studies on NoCs such as design 
methodology [10], [11], topology exploration [21], Quality-of- 
Service (QoS) guarantee [22], resource management by 
software [23], and test and verifications [24]. 

In large-scale SoCs, the power consumption on the 
communication infrastructure should be minimized for reliable, 
feasible, and cost-efficient implementations. However, little 
research has reported on energy- and power-efficient NoCs at a 
circuit or implementation level, since most of previous works 
have taken a top-down approach and they did not touch the 
issues on a physical level, still staying in a high-level analysis. 
Although a few of them were implemented and verified on the 
silicon [25], [26],they were only focusing on performance and 
scalability issues rather than the power-efficiency, which is one 
of the most crucial issues for the practical application to SoC 
design. 

Network-on-Chip (NoC) architectures employing packet-based 
communication are being increasingly adopted in System-on- 
Chip (SoC) designs. In addition to providing high performance, 
the fault-tolerance and reliability of these networks is becoming 
a critical issue due to several artifacts of deep sub-micron 
technologies. Consequently, it is important for a designer to 
bave access to fast methods for evaluating the performance, 
reliability, and energy-efficiency of an on-chip network. [27] 
While on-chip networks have been proposed and studied in the 
academic literature, to date there have been very few 
implementations of routed on-chip networks. Dally and Towles 
[10] proposed a 2D torus network as a replacement for global 
interconnect. They claim that on-chip network modularity 
would shorten the design time and reduce the wire routing 
complexity. On-Chip routed networks have also been proposed 
for use in SoCs such as in CLICHÉ [12] , in which a 2D mesh 
network is proposed to interconnect a heterogeneous array of 
IP blocks. 

A performance analysis also shows that dynamic resource 
allocation leads to the lowest network latencies, while static 
allocation may be used to meet QoS goals. Combining the 
power and performance figures then allows an energy-latency 
product to be calculated to judge the efficiency of each of the 
network [28]. 

In his work, Nikolay K. Kavaldjiev , used run-time 
reconfigurable NoC for streaming DSP applications taking the 
advantage of a global comunication architecture that avoids 
limitation by structuring and shortening the global wires. He 
also proposed an architecture of a virtual channel router, which 
in contrast to conventional architectures is able to provide 
predictable communication services and has a lower 
implementation area and cost than conventional architectures. 
Dynamic reconfiguration is essential to support the 
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dynamically changing demands of the application domain: the 
system operates in a constantly changing environment. The 


user demands change (e.g., starting/terminating applications), ` 


the environmental conditions change (e.g., available networks, 
wireless channel conditions) and the available power budget 
8lso changes (decreasing battery budget or connected to the 
mains). The set of running applications and tasks in the system 
adapts dynamicaly to these changes. The  run-time 
reconfiguration modifies the system communication demands. 
For example, a new data stream may be needed or some of the 
old streams may be redirected or replaced. The NoC must be 
, able to handle such dynamically changing traffic conditions. 
Run-time changes in part of the traffic must be possible without 
' disturbing the rest of the traffic. The network reconfiguration 
' time must be short enough to enable adequate system reaction 
time and reconfiguration must be transparent to the user. [30] 
The major goal of communication-centric design and NoC 
paradigm is to achieve greater design productivity and 
performance by handling the increasing parallelism, 
. manufacturing complexity, wiring problems, and reliability. 
The three critical challenges for NoC according to Owens et al. 
are: power, latency, and CAD compatibility [17]. The key 
research areas in Network-on-Chip design can be summarized 
as [29]: 

Communication infrastructure: ploy sind Шор, 
' buffer sizing, floorplanning, clock domains, power 
Communication paradigm: routing, switching, flow control, 
quality of service, network interfaces 

Benchmarking and traffic characterization for design and 
runtime optimization 

` Application mapping: task mapping/scheduling and IP 
: component mapping. 


: METHODOLOGY 
` Network-on-Chip is a new paradigm for interconnecting 
' today's heterogeneous IP cores based System-on-Chips (SoCs). 
In SoC's IP Cores are connected to network of routers using 
network interfaces and network is used for packet switched on- 
' chip communication. Conventional computer design tools i.e. 
Packet Tracer 5.3 utility from CISCO are used for network 
design and simnlation. It provides a versatile practice and 
_ visualization environment for the design, configuration, and 
' troubleshooting of network environments. The work done by us 
‚ uses same tool to compare two topologies. The 2-D mesh is 
currently the most popular regular topology used for on-chip 
, Detworks in tile-based architectures, because it perfectly 
matches the 2-D silicon surface and is easy to implement. 
, However, a number of limitations have been proved in the open 
' literature, especially for long distance traffic. In this type of 
topology, every node has a dedicated point to point link to 
' . every other node in the network. This means each link carries 
' traffic only between the two nodes it connects. 
ТЕ N is total no of nodes in network. Number of links to 
' connect these nodes in mesh = N (N-1Y2 Each node should 
' have (N-1) I/O ports as it require connection to every another 
' pode. 
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The advantages are: 
» No traffic problem as there are dedicated links. Robust 
as failure of one link does not affect the entire system. 
>  Secürity as data travels along a dedicated line. 
> Points to point links make fault identification easy. 
The disadvantages are: 
> The hardware is expansive as there is dedicated link 
for any two nodes and each device should have (N-1) 
VO ports. 
> There is mesh of wiring which can be difficult to 


manage. 

> Installation is complex as each node is connected to 

every node. 

As earlier studies have shown that maximum power is 
consumed by links and interconnect infrastructure. Reducing 
interconnects and links will result in lower power consumption 
but can also affect the performance and reliability negatively. 
The topology suggested by us reduces the number of links thus 
resulting into lower power consumption keeping same level of 
reliability and performance levels. 


SIMULATION . 

AS shown in Daure 1 add figure. De minber kac he 
mesh topology is 24 while in proposed topology the number of 
links are 20. The number of hops a packets traverses in the 
longest path is 5 in Figure 1 while 4 in Figure 2. The time taken 
by a packet to traverse the longest path is 0.014 sec in Mesh 
topology while 0.009 sec in proposed topology as shown in 
table 1 and table 2. The percentage reduction in time a packet 
takes on longest path is 35.7 %. 
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Table 1: Result analysis of Mesh topology 
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Table 3: Result analysis of proposed topology 2 


Comparison of proposed topology 2 and 3 shows further 
improvement in total flight time in traversal of a packet on the 
longest path. Addition of one link between R17 and R16 
reduces the traversal time as well as number of hops. 
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CONCLUSION 

The results achieved in terms of time and reduction in number 
of links displayed here is encouraging and motivates us to take 
the work further. As discussed earlier the NoC technology can 
borrow the tools and techniques from conventional computer 
network technology with required customization. In our future 
work, we intend to test same on a standard NoC benchmark. 
The other design parameters on NoC will also be explored. 
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Efficiency Metrics 


ABSTRACT 

‚ Software measurement is a challenging but essential component 
of a healthy and highly capable software engineering culture. It 
is an integral part of the state-of the- practice in software 
' engineering. More and more customers are specifying software 
‹ and/or. quality metrics reporting as part of their contractual 
requirements. Software Engineering has always been a matter 
of. concern for every individual involved in software 
‚ development starting from analysis phase to delivery phase or 
even ;at the, maintenance time. There have been novel 
„approaches for developing program complexity metrics. In this 
regard. we have proposed the Efficiency Metrics, which can 
. calculate the efficiency of a programmer and can also calculate 
‘the exact.time taken by the development team to complete the 
-apoftwaxe development under various complexities. Over and 
ee MD аы шо een one ше 
Пейто». 

KEYWORDS 

, LOC, Mean, Standard Deviation, Low, Medium, High, Errors, 
' Delay Time, Committed Time. 


'1. INTRODUCTION 

There have been novel approaches for developing program 
‘complexity metrics. The first which was developed by 
Halstead[16], uses a series of software science equations to 
, measure the complexity of a program. McCabe[17], uses graph 
' theoretic measures to define a cyclomatic complexity metric. 
‚ Albretch[18], who hypothesized that the amount of function to 
be provided by an application program can be estimated form an 
‘itemization of the major components of data to be used or 
` provided by it. In this regard we have proposed the Efficiency 
| Metrics, which can calculate the efficiency of a programmer and 
can also calculate the exact time taken by the development team 
to complete the software development under various 
‚ complexities. Fear is often a software practitioner’s first 
_Teaction to a new metrics program. People are afraid the data 
will be used against them, that it will take too much time to 
' collect and analyze the data, or that the team will fixate on 
getting the numbers right rather than on building good software 
: [20]. Creating a software measurement culture and overcoming 
г such resistance will take diligent, congruent steering by 
managers who are committed to measurement and sensitive to 
these concerns. Software metrics, ted in various 
textbooks, e.g. [11],[12],[13],{14] and conferences and 
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workshops [12], has a long tradition in theory, while 
considerably shorter in terms of industrial applications. 
Software metrics relies on the underlying theory, called 
representational measurement theory, posing some requirements 
on a correct definition, validation, and use of software metrics. 
From practical point of view, there are several further questions 
of importance, e.g. how to identify the right metrics to use, how 
to introduce a metrics programme, and bow to keep it alive. 
Software process and product metrics are quantitative measures 
that enable software people to gain insight into the efficacy of 
software process and the projects that are conducted using the 
process as a framework. Basic Quality and productivity data is 
collected. This data is then analyzed, compared against the past 
averages, and assessed to determine whether quality and 
Productivity improvements have occurred or not [7]. Metrics 
are also used to pinpoint problem areas so that remedies can be 
developed and the software process can be improved [5]. 


A comparison of software metrics by Halstead, McCabe and 
Albrecht, in terms of their ability to measure software 
productivity has led to the conclusion that in the areas where it 
is applicable, the function point metric is the best of the 
three[14]. It should be noted that the values of Halsted's metrics 
becomes available anly after the códing is done and therefore 
can be of use only during the testing and maintenance phase. 
The increasing demand of the software industry across tbe globe 
is that it needs both the development of improved software 
metrics and improved utilization of such metrics.[1] Software 
metrics can be classified into product metrics & Process Metrics 
or Objective Metrics & Subjective Metrics. On these bases 
many Software Models and Software Metrics have been 
proposed like Size Metrics by Boehm & Johns [8], Function 
point Metrics by Albrecht, Bang Metrics by Demark, 
Information Flow Metrics by Kafue & Henry etc. 
Measurement is the process by which numbers or symbols are 
assigned to attributes of entities in the real world in such a way 
88 to describe them according to clearly defined unambiguous 
rules. A good measurement program is an investment in success 
by facilitating early detection of problems, and by providing 
quantitative clarification of critical development issues. Metrics 
give you the ability to identify, resolve, and/or curtail risk issues 
before they surface. Measurement must not be a goal in itself. It 
must be integrated into the total software life cycle — not 
independent of it [10]. Different type of measurement for 
different parameters of software product is possible through 
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different types of metrics. Proposed research work is an effort to 
present a Delay metrics, which will solve the problem of time 
delay in software development. 


2. RELATED WORK 

.Many Researchers have been working on the exact time of 
i development and they have also succeeded to some extent. 
. Halstead and Raleigh has been able to find the development 
, time however the results would have been more accurate, had 
the efficiency of the programmers also been taken 'into 
consideration.[1][2]  Goal-question-metric (GQM) is an 
: effective technique for reducing the average time and to close a 
defect by 40 percent within three months. However it too lacks 
the programmers efficiency in its calculations [4], because the 
distribution of reasons for delay varied widely from one 
! department to another, it is recommended that every department 
should gain an insight into its reasons for delay in order to be 
' able to take adequate actions for improvement [2]. The field of 
¦ software engineering especially in the field of software metrics 
' the success rate is not that good because most of the software 
! development companies avoid to follow the proposed metrics. 
'Project initiation is a good time to choose the appropriate 
measures tbat will help developer to assess project performance 
and product quality [6]. To plan measurement activities 
‘carefully will take significant initial effort to implementation 
; and the payoff will come over time [3]. 


Yin and Winchesters Metrics [15], which depend on design 
structure can be useful in identifying sections of a design that 
may cause problems during coding, debugging, integration and 
' modification. This metrics is available from the design phase 
onwards and hence can be used to predict values like the 
number of errors in the system, time for system testing, time for 
rectification of errors etc. Henry and Kafura's Metrics[15] is an 
appropriate and practical basis for measuring large scale 
systems. The major elements in the information flow analysis 
can be directly determined at design time, thereby allowing any 
corrections in the system structure with the minimum cost. Also 
‘by observing the pattems of communication among system 
‘components, it is possible to define measurements for 
. complexity, module coupling, level interactions and stress 
. points in the design. These critical system qualities cannot be 
derived from simple lexical measures. In a nutshell we can say 
! that this metric is to determine the complexity of a procedure 
: which depends on two factors: the complexity of the procedure 
, code and the complexity of the procedures connections to its 
| environment[15]. Once the errors are predicted by Yin and 
| Winchester Metrics and the complexity of the code calculated 
| by Henry and Kafura;s metrics, there is a need to develop a 
' metrics which will calculate the exact time of development 
, being the complexity of a procedure or program its important 
 parameter[19]. 


` Background of the early depicted Software Models: 
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COCOMO Model 


The most fundamental calculation in the COCOMO model is 
the use of the Effort Equation to estimate the number of Person- 
Months required to develop a project. Most of the other 
COCOMO results, including the estimates for Requirements 
and Maintenance, are derived from this quantity. The original 


'COCOMO 8] model was defined in terms of Delivered Source 


Instructions, which are very similar to SLOC. The major 
difference between DSI and SLOC is that a single Source Line 
of Code may be several physical lines. For example, an "if- 
then-else" statement would be counted as one SLOC, but might 
be counted as several DSI. However the efficiency of the 
programmer is not taken into consideration while performing 
such calculations to meet the deadlines of the client. 


Waterfall model 


The waterfall model however is argued by many to be a bad 
idea in practice, mainly because of their belief that it is 
impossible to get one phase of a software product's lifecycle 
"perfected" before moving on to the next phases and learning 
from them. A typical problem is when requirements change 
midway through, resulting in a lot of time and effort being 
invalidated due to the "Big Design Up Front". Only a certain 
number of team members will be qualified for each phase, 
which can lead at times to some team members being inactive. 
Had the programmers efficiency been checked before handing 
them over this job, the project manager could have assigned 
high efficiency programmers for coding. 


Spiral model 

In spiral model the software is developed in a series of 
incremental releases with the carly stages being either paper 
models or prototypes. Later iterations become increasingly more 
complete versions of the product. Major flaws identified in 
spiral model is that Demands considerable risk-assessment 
expertise and has not been employed as much proven models . 


Java Execution model 

Though this model can check the performance of the software 
developed in Java but still lacks the time and efficiency 
constraints.[21] 

Any of these COCOMO, WaterFall or Spiral models have been 
run in the software industry but when there are sharp deadlines 
for the completion of the project by client, such models become 
obsolete without housing the efficiency metrics. 


Relation with Defect Removal Efficiency 

Defect Removal Efficiency (DRE) is a measure of the efficacy 
of your SQA activities.. For eg. If the DRE is low during 
analysis and design, it means you should spend time improving 
the way you conduct formal technical reviews. 
DRE=E/(E+D) 
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Where E = No. of Errors found before delivery of the software 
and D = No. of Errors found after delivery of the software. [22] 


Remedy: If DRE is low during analysis and design, we could 
. find the efficiency of programmers and put the best ones for 
coding purposes to meet the deadlines of client in time bound 
and result oriented fashion.\ 


Feature Performance Metrics 

Firstly, relative value is measured by the impact that each 
feature has on customer acquisition and retention. Secondly, 
feature value is compared to feature cost and specifically 
development investment to determine feature profitability. 
Thirdly, feature sensitivity is measured. Feature sensitivity is 
defined as the effect a fixed amount of development investment 
has on value in a given time. Fourthly, features are segmented 
according to their location relative to the value to cost trend line 
into: most valuable features, outperforming, underperforming 
and fledglings. Finally, results are analyzed to determine future 
action.[23] 


3. PROPOSED WORK 
If there are twenty programmers hired by the company, though 
there language skills, technical knowledge and aptitude is 
checked by the recruitment team, however it is not necessary 
that all of them would be having same expertise in a particular 
programming language or their level of aptitude and typing 
, skills. So it is necessary to check their efficiency before 
assigning them the projects. Based on the efficiency, the work 
force management team of the organization shall assign the 
„programmer a particular module of development where he/she 
can give their best with less assistance. If we don’t measure our 
current performance and use the data to improve our future 
work estimates, those estimates will just be guesses. Because 
' today's current data becomes tomorrow’s historical data. 


. We have tried this efficiency metrics at the initial phase of the 
software, after analysis. The team leader (project in charge) 
took up the manpower for his assigned project, based on this 
efficiency metric. He picked up the people whose efficiency 
rated (7-9) for very complex modules, (4-6) for normal modules 
' and (3-4) for casy modules, be it designing or coding. 


, In this paper we propose efficiency metric in which we are 
sing three constants: 
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The proposed efficiency metric is defined as: 





Where, 


Бүр) is the efficiency of a programmer in a project. 

F is the function complexity 

LOC, is the lines of code developed for assigned function. 

Pe is the programmer's status. 

To ів the total time consumed (in minutes) for developing the 
Lines of code. 

e is an efficiency constant and its value is 100. 


4. EXPERIMENT 
In Table 1, value of fifth column is the value of efficiency of 
programmer, which is obtained by the proposed metric. 
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Table 1 
A Table (Sample Data) calculating ће Efficiency of a 
_ _ Programmer fora software development project 
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Figure 1 shows the efficiency of different programmers at 
development of functions of different complexities. 


By having a look at the chart | above, it is clear that the 


efficiency of programmers do not vary much when we need to 


develop programs of simple complexity however there is much | 
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A relation between time taken and Efficiency: 

We have analyzed the data of DE InfoTech (A software 
company of repute) as depicted in Table 2, where programmers 
of any status are given the suite to develop, and we have found 
that as time taken for development of code is more, the 
efficiency of the programmer is less. (Table -2) is an extraction 
of the two parameters Time and Efficiency from Table -1 
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From the above Table-2, we have depicted the following bar 
graph which clearly states that the efficiency of a programmer is 
inversely proportional to time. 

Evol 

| T 
Where, ` 
Т(с) is the time consumed in development and 
E(Prog) is the efficiency of the programmer. 


5. CONCLUSION 


Though the changes in the analysis, design and code, are certain, > 


we can still calculate the efficiency of the manpower 
(Programmers) before we involve them in a project of 
development. We shall be able to reap better results by 
assessing the past development data from knowledge bases of 
various companies and learn by the development hurdles which 
they have faced. The programmer's efficiency table shall be 
able to calculate the efficiency of the programmer to an 
appropriate level based on his atptitude, typing and 
programming skills. This efficiency shall allow us to forecast 
the manpower required for development of a project under 
certain level of complexity, to be very close to the deadline of 
the client. 


6. FUTURE SCOPE 

Since Yin and Winchester metrics plays a vital role in the 
design phase of software development, Henry and Ksfura's 
metrics serves as a base for our efficiency metrics as it helps us 
to access the complexity of a procedure. Both these metrics are 
helpful till the design phase however become obsolete when we 
eater the coding domain of software development. So our 
efficiency metrics will help us to a great extent in the coding 
pert of the software development process. However the 
proposed software metrics is rarely followed by the companies 
of repute because of the reasons best known to them [6]. So it 
would be better if all the software companies of repute tie up 
with good academic institutions so that the researchers get the 
exact past development data to come up with an appropriate 
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knowledge -base which will belp us to make future software 
metrics to maintain and manage domestic and global deadlines. 
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ABSTRACT 

Enterprise Resource Planning was a term restricted purely to elite 
class. ERP for small business calls for voluminous investments. 
But the question that kept ringing in the market was can everyone 
afford it? The answer was a stubborn no initially but not anymore. 
The world is changing, and new opportunities are appearing every 
day. Globalization, once the domain for only large companies, is 
now presenting new markets for growth for small to mid market 
companies. 

In today's competitive manufacturing environment, it takes 
more than quick fixes, outsourcing and downsizing to consistently 
achieve growth and profit objectives. While these options may 
yield temporary financial relief, they will not lead the way to long- 
term growth and profitability. For companies to grow and 
consistently exceed bottom line expectations, they need to get lean. 
And to get lean they should master eight basics of Lean Six Sigma. 
Today every organization strives to optimize its operations, further 
based on the type of problems, combining Lean and/or Six Sigma 
tools with traditional project management techniques for ERP 
Implementation can be a powerful combination for ERP 
Sustainability in Small & Medium Enterprises. 


KEYWORDS 
ERP, Lean, Six Sigma, SIPOC, DMAIC, DMADV, ‘TOC, BPI, 
Process Benchmarking, STOPE etc. 


1. INTRODUCTION 

Profit = (Price ~ Cost) x Volume 

Profit with Growth remains in top of mind as Small and Medium 
Enterprises (SMEs) develop Enterprise Resource Planning (ERP) 
strategies. For years SMES have followed the lead of the larger 
corporations in terms of "how" and “what” to select regarding ERP 
systems. That leadership role is currently faltering due to 
Corporate disillusionment with single ‘corporate standard’ 
implementation and Corporate focus shift from large scale 
purchases to integration. Results are scrap, rework and warranty 
costs that negatively impact profitability, quality and shipment 
problems that deliver less than acceptable customer satisfaction. 
ERP implementations represent high-risk projects that need to be 
managed properly. Small and medium.organizations must identify 
when in the process to address them effectively to 


ensure that the promised benefits can be realized and 
potential failures can be avoided[1][2]. 

Once having taken the hurdles and having decided to fend 
for themselves, the SME buyers should be more focused 
and relevant. To Get to Root Causes for failed ERP 
implementation, what is required first is a company-wide, 
in-depth understanding of tbe fundamentals of Six Sigma 
and then a total commitment to the consistent and 
tenacious execution of eight basics of Lean Six Sigma 
[20]. 


Putting Together IT, Leadership & 
Management 





Figurel: Process, Tools and Business Results 


As shown above in Figure 1, this research paper is not 
about Lean Manufacturing, TOC (Theory of Constraints), 
Six Sigma or ERP, It is about relating them functionally 
to each other; It is about synergy and interactions between 
these elements and It is about their relationships to the 
rest of the business enterprise[5][25]. 


2. BACKGROUND 

Despite the large investment, most SMEs make in ERP 
software, benefits are by no means guaranteed. Many 
industry leaders, including Panorama Consulting Group, 
have published papers regarding the evasive nature of 
ERP benefits. Their 2010 ERP Report outlines, 67.5% of 
companies surveyed fail to realize at least half of the 
business benefits they expected from their ERP 
systems[6][21]. In addition, over one in three companies 
surveyed (40%) realized major operational disruptions 
after implementation go-live, such as the inability to ship 
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products or to close the books. Finally, 71.596 of executives and 

67.1% of employees are at least somewhat satisfied with their ERP 

solutions. Factors that have a critical effect on the ROI of the ERP 

investment as mentioned in Figure 2 should be carefully managed 
..8s part of an overall ERP benefits realization plan[23][24]. 
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Figure 2: ERP Results (%) 


To know how to get the best from an ERP package, it is 
important to first analyze the key factors that are responsible for 
ERP failures. Some major factors are described below [3][17]: 
> Incorrect Expectations: In an ERP implementation, inaccurate 

expectations signify a lack of understanding of the complexities 
„ of ERP implementation standard. Cost and Schedule overruns 
d are common. 
> Inaccurate Data: Accurate data is the lifeline of an ERP 


Experience has that at least 98% of inventory records . 


system. 
and bills of material must be accurate to make the system usable 
to control the business, 

» Improper Gap Analysis: Lack of perfect tuning between IT 
professionals, Business owners and End users only compounds 
the problem, at the other side. 

> Inability to Calculate Hidden Costs: In addition to the cost of 
purchase, most organizations often fail to factor in hidden costs 
during evaluation, consulting, implementation, training, 
transition, delayed ROI and post implementation support. All 
the above factors can lead to cost overruns, schedule overruns 
and functionality overruns. This ultimately results in negative 
ROI and a prolonged payback period. 

» Elongated Implementation Time: It often leads to fatigue, 
stressed and dubious state of mind in users which affect the 
growth period of ERP, to a greater extent. 

> Inability to Accurately Map Bune Process If the ERP 
package is implemented by professionals who do not have 
adequate knowledge about the business, it leads to improper 
mapping of the business processes. Since ERP systems attempt 
to get the most out of planned information, they are most useful 


mostly found when going live with the new system[7]. 
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» Lack of Proper Monitoring System: It hampers the 
quality of the end system. As most of the ERP systems 
are not flexible, not ready to upgrade automatically in 
the varied system lead to the improper flow of 
information that hampers the quality decisions taken in 


ume. 

» Disheveled Knowledge Base: Companies often lack 
tools to capture and record the knowledge gained 
during implementation and further use of this as 
checklist. Thus redundancy of the same process often 

> Inadequate Training & Documentation: Several 
organizations often train users only during initial 
implementation stages and rarely provide additional 
training for new employees and those who have 
undertaken job rotations. Consequently system 
knowledge and usage tend to dip significantly after 
implementation. Documentation is also scarce and 
poorly maintamed [8][9]. 


3. PROCESS BENCHMARKING THROUGH BEST 
BPI FOR SMES ERP 


By Selecting the Best Business Process Improvement 
efforts, success is realized in a Lean Six Sigma 
deployment as depicted in Figure 3 below: 


Figure 3: Best BPI with Lean Six Sigma 


Lean eliminates non-value added steps or waste from the 
process while Six Sigma improves quality of value adds 
steps by reducing the variability in the process. A six- 
sigma process is one in which 99.99966% of the products 
manufactured are free of defects, compared to a one- 
sigma process in which only 3196 are free of 
defects[10][23]. Without the solid execution of Lean Six 
Sigma basics, companies will seldom achieve their full 
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growth and profit potentials of ERP. Here are the eight basics of 
Lean Six Sigma which every manager should know and implement 
[20][21]: 

(D Information Integrity: It is not uncommon for front office 
management to become disenchanted with computerized systems 
results when time schedules and promised paybacks are not 
achieved. It is a given that acceptable systems results cannot be 
achieved when systerus are driven by inaccurate data and untimely, 
uncontrolled documentation. 


Qi) Performance Management: Measurement systems can be 
motivational or de-motivational. The individual goal-setting of the 
1980s is a good example of de-motivational measurement - it 
tested one individual or group against the other and while 
satisfying some individual egos, it provided little contribution to 
overall company growth and profit. Today, the balanced scorecard 
is the choice of business winners. 


(iii) Sequential Production: It takes more than systems 
sophistication for manufacturing companies to gain control of 
оа To achieve on-time shipments at healthy profit 

margins, companies need to replace obsolete shop scheduling 
methodology with the simplicity of sequential production. 
Manufacturing leaders have replaced their shop order "launch and 
expedite” methodology with continuous production lines that are 
supported by real-time, visual material supply chains...sequential 
production. The assertion that sequential production only works in 
high production, widget-manufacturing environments is myth. 


(iv) Point-of-Use Logistics: Material handling and storage are two 
Of manufacturings high cost, non-value-added activities. The 
elimination of the stock room, as it is known today, should be a 
strategic objective of all manufacturers. Moving production parts 
and components from the stockroom to their production point of 
use is truly a return to basics and a significant cost reducer. 


(v) Cycle Time Management: Long cycle times are symptoms of 
poor manufacturing performance and high non-value-added 
costs[11]. Manufacturers need to focus on the continuous reduction 
of all cycle times. Achieving success requires a specific 
management style that focuses on root cause, proactive problem 
solving, rather than "fire-fighting". 


(vi) Production Linearity: Companies will never achieve their 
full profit potential if they produce more than 25 percent of their 
monthly shipment plan in the last week of the month or more than 
33 percent of their quarterly shipment plan in the last month of the 
quarter. As companies struggle to remain competitive, one of the 
strategies by which gains in speed, quality and costs can be 
achieved is to form teams of employees to pursue and achieve 
linear production. 


(vii) Resource Planning: One of the major challenges in industry 
today is the timely right sizing of operations. Profit margins can be 
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eroded by not taking timely downsizing actions, and 
market windows can be missed and customers lost by not 
upsizing the direct labour force in a timely manner. These 
actions demand timely, tough decisions that require 
accurate, well-timed and reliable resource information. E 


(ҮШ) Customer Satisfaction: It does no good to have the 
best products and services if the customer's perception of 
"as received" quality and service is unsatisfactory. 
Companies need to plan and implement proactive projects 
that breakdown the communication barriers that create 
impressive customer perceptions [12][13]. 


4. NEW TECHNOLOGY for the Next Decade 
Lean Six Sigma is a relatively new quality improvement 
methodology resulting from the combination of the 
individual Lean and Six Sigma methodologies. It started 
in the late 1990s when both AlliedSignal and Maytag 
began cross-training employees in the two frameworks 
and combined aspects of each. A focus on Lean occurs 
when short-term gains are desired and business leaders 
believe that a value stream map will reveal appropriate 
solutions; Six Sigma is preferred when the problem is not 
obvious, and/or when a longer time frame is required. 
Lean Goals focuses on eliminating waste from processes 
and increasing process speed by focusing on what 
customers actually consider quality, and working back 
from that. Lean Methods include Value Stream Mapping ye 
that involves clarifying the customer base, listing the 
process steps, establishing which steps are value-add, and 
кылыы гос a0 they lie. айе мер How ОШ 
interruption[18][19]. 
Six Sigma is a business management strategy’ originally 
developed by Motorola, USA іп 1981." As of 
2010[update], it enjoys widespread application in many 
sectors of industry. Six Sigma is a rigorous and a 
systematic business management methodology that 
utilizes information and statistical analysis to measure and 
improve a company's operational performance, practices 
and systems by identifying and preventing ‘defects’ in 
manufacturing and service-related processes in order to 
anticipate and exceed expectations of all stakeholders to 
accomplish effectiveness. 
Each Six Sigma projects follow two important project 
methodologies, as DMAIC and DMADV. While DMAIC 
is used for projects aimed at improving an existing 
business process, DMADV is used for projects aimed at 
creating new product or process designs [13][191(20]. 
4.1 The DMAIC Project Methodology 


The DMAIC project methodology has five phases as E 
mentioned below: 


(D Define the problem, the voice of the customer and the 
project goals specifically. Design goals that are consistent 
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with customer demands and the enterprise strategy. The results of It is critical that the metric be Real, Reliable and 
the Define phase go into the Project Charter as the goals,  Repeatable. ‘Real’, because it must be relevant to the 
objectives ‘and deliverables of the project as shown in Figure 4 business. The metric must address a real business problem 
below: ' and measure it in business terms. The metric must be 

i Reliable’, in the sense that it leaves no room for doubt 
and inchides a drill-down to any underlying facts. Lastly, 
the metric must be, ‘Repeatable’, because you will need 
to show historical trends in order to show the progress of 








the Master Data Management program. 

Ж \ (Ш) Analyze the data to investigate and verify Cause-and- 
aaa ar S O ne TD аси Effect Relationships. Determine what the relationships 
алса жооон O — sre, and attempt to ensure that all factors have been 

аң Ры considered. Seek out root cause of the defect under 

Кокке ra iar investigation 
During the Analyze phase, we might use a Ishikava 
Bari Сарот нра Fishbone Analysis (Cause-effect diagram) (Figure 6) to 


analyze the causes of disintegrated master data. We begin 
the fishbone by showing the undesirable effect of, 
‘Duplicate Disintegrated Customer Data’, in a box on the 
right side of the diagram. Then we list the various causes 
that produce this effect including Architecture causes, 
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та ETENEE Governance causes, Organization causes and Process 
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Figure 4: Six.Sigma Project Charter Template Ё ES em UI 
(Н) Measure key aspects of the current process-and collect relevant h P $ ance | 
t ; See es i. | 
data. Measure, Measure, Measure. It is often said that we can't Cu y es] (2+5) | 


achieve what we don't measure, and it's true. It is important to 
‘establish post-go-live ERP performance measures. This step is the Figure 6: Ishikava Fishbone Analysis 
‘key to an effective ERP benefits realization program. At the end of (lv) Improve or Optimize the current process based upon 
the Measure phase, one should have a detailed process map that data analysis using techniques such as design of 
clearly shows how our process is currently performed, as well as experiments, mistake proofing, and standard work to 
dafa and charts that tell how well these processes meets customer Create a new, future state process. Set up pilot runs to 
requirements. establish process capability. We can use SIPOC (Supplier 
. Input Process Output Customer) in the Improve phase to 
brainstorm improvements to the process. 
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Figure 5: Six Sigma Measure Phase 
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The SIPOC diagram in Figure 7 depicts the new improved process, some extent, it can be noticed that the I letter of DMAIC 

‘Unique ID Service’, and lists Order Management as the supplier is not far removed from the D letter of DMADV. Here 

function. They supply the input of customer name that is matched design is an extended concept of improvement. Let's 

in the, ‘Unique ID Service’, into the output, ‘Matched Customer’. simply put it the other way around. One can implement _ 

And Strategic Procurement might be the customer of this process. DMADV when we don't have an existing product, which 

we are aiming to create from scratch. The second 

(v) Control the future state process to ensure that any deviations occasion when we can think of using DMADV is when in 

from target are corrected before they result in defects. Control actual practice, DMAIC hasn't yielded the result you were 

systems are implemented such as statistical process control, looking for despite best efforts to make improvements. 

Production boards and visual workplaces and the process is 

continuously monitored as depicted in Figure 8 below: 4.4 Which One is Better and When? 

In a nutshell, the latter reason can be summarized as: Use 

DMADV when process improvement either fails or 
doesn't deliver to your expectations. There are occasions 

| when planned DMAIC has turned into DMADV 

| ultimately. Black Belts must take credit for this, in my 

| view, as this reflects their in-depth subject knowledge. 

| 
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The combination of the rigor of Six Sigma with the 
simplicity and practicality of Lean Enterprise gives 
organizations a larger cadre of tools to solve a broader 
range of problems. The result is the faster creation of 
value at the lowest possible cost. But it is imperative that 
the lean mindset begins at software selection that must 
continue through ERP implementation, and doesn't stop 





Widechid ой» 


until well after go-live. 
4 

5. Key Tools for Use While Identifying BPI Efforts for 

4.2 The DMAIC Project Methodology The two primary tools for identifying and prioritizing BPI 
While the DMADV project methodology, also known as DFSS efforts are tbe Tree diagram and the Benefits/Effort 
(ашыш ишесе Matrix. A Tree Diagram is simply a tool for organizing 
* Define design goals that are consistent with customer demands ideas (Figure 9). It branches off from the value drivers, 
and the enterprise strategy. which are major opportunity areas for value creation and 


e Measure and identify CTQs (characteristics that are Critical to Lean Six Sigma ВРІ efforts. Each value driver has many 
Quality), product “Capabilities, production process capability, opportunity areas for BPI efforts. Many ideas that emerge 


and risks. from the opportunity areas are still too broad for a Lean 
Analyze to develop and design alternatives, create a high-level Six Sigma BPI effort, and specific efforts must be 
г л о лыг identified. BPI effort ideas then go through the BPI effort 


e Design details, optimize the design, and plan for design Selection process [28]. 
verification. This phase may require simulations. 

e Verify the design, set up pilot runs, implement the production 
process and hand it over to the process owner(s). 


4.3 Difference between DMADV and DMAIC Methodology  ' 
The difference between DMADV and DMAIC as one can see now, 
exists only in the way Jast two steps are handled. In DMADV, 
instead of the Improve and Control steps which focuses on 
readjusting and controlling by one way or other, deals with 
redesigning the process to fit customer needs [27]. There is a new 
viewpoint in Six Sigma circles that DMADV is for designing new 
products and services and that it may not be successful on existing 
business processes and products. Although the argument is valid to 
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Figure 9: Tree Diagram | 





' Figure 9: Tree Diagram of BPI for ERP Selection 
While a Benefits/Effort Matrix helps practitioners must 
ристе 


Шак! 
* Customer Fit 
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Figure 10: Benefits/Effort Matrix of BPI for ERP Selection 


Once practitioners identify BPI efforts, they soul establish a 
labeling system, sch as nifinbezing them, and place them within 
the matrix. BPIs with high benefits that require low effort are the 
most desirable opportunities, while BPIs with low benefits but also 
low effort should be considered as potential quick hits. 
· Opportunities that require high effort and offer low benefits are the 
, less desirable. 


(6 HOW SUCCESSFUL ERP SELECTIONS ARE MADE BY 
, SME? 

“The top things to look for, look -at, and look beyond when 
evaluating an ERP purchase. ERP selection is iit just about wants 
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and want-nots from the various people in the 

organizanon It should be a long lasting purchase thi 

provides one with the feeling of a partnership [13] [14 

One is not just buying software; onê is also buying into a 

vendor and their company culture. 

The analysis has addressed some critical selection 
factors from the survey results conducted on SME project 
leaders for ERP implementation [4][17]. These bx. 
selection factors are: 

a) System Functionality Requirements: MEN. 
of the system to suit business. Systems will need to 
support a more integrated style of business processes, 
including womb-to- tomb management of заг 
company;contractor-and-supplier relationships. 


b) Business Drivers: Financial benefit to the company 


of the selected system. 

c) Cost Drivers: Diet cost of the implementation in 
terms of outlay and resources. 

d) Flexibility: Ability to tune or optimize the system to 
meet unique requirement of the company. 

e) Scalability: Size of the system to suit the business and 
ability to grow with the business. 

f) Usability: Systems mnst support the emerging point- 
and click generation. 

g) Reliability: Sysemb das achieve. ths pubs goal df 
24 hours a day, seven days a week, with 99.9999 
percent availability as the backup goal. Systems also 
noed to DE sate and resistant AQ Ера теден» 
[15116]. 

h) шу: Dexa or болыр аары ates val 
grow as people tire of the World Wide Wait. 

i) Supportability: Systems must improve their 
capabilities in a smooth evolution rather than through 
Ja M ка ы M: 


.p Integrity: Complexity will, drive the movement 


toward - component-based integration so that more 


7. BOTTOM LINES FOR SMES BUYERS i 
DEFINED 

Once having taken the hurdles and having decided to fend 
for themselves, the SMB buyers should be more focused 


_ and relevant. They should include [9] [14]: 


Y How scalable and how diverse is the potential vendor's 
product today? ` ^ — ~ 

7 Does the: FRO mor OVE EA 
supporting large as well as medium sized and small 
business with one set of software? 

Y Are they thinking about their customer and how they 
will assist them crossing over the pext technological 
paradigm shift? 
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|7 Have they exhibited a track record of helping their customer 
, base in the past over prior technological shifts? 

Y Does the ERP software company have a general discrete focus, 

a niche focus or are they strong in both? 

'Y Does a vendor have a role in high growth ‘legacy system 
, modernization’ market space? 

МЇ Do: they intend: to extend ther poftware with business 
' intelligence and enterprise information integration initiatives 
that make it easier to talk to other ERP software? 


8. EVALUATING ERP SUSTAINABILITY & 
PERFORMANCE MEASUREMENTS IN SMES 

The compeny should have a scale for evaluation right from the 
beginning stage. The company must periodically make a note of 
the work done. Any discrepancies will be brought to the vendor's 
notice immediately [17][18]. The vendor should extend his full 
fledged cooperation in making sure that the work gets done as 
promised. Then only it is possible to scale ERP best practices. 


8. 1 Calculating ROI 

ROI helps to directly account the performance of ERP software 
programs. The ROI on ERP will not be merely achieved by ERP 
implementation. Tbe returns will be achieved only if the 
procedures are followed properly. 


8.2 Unfailingly Observing Contracts Terms 
‘The performance of ERP software can be gauged on the basis of its 
‘working in relation to the terms of contract. ERP software that 
accords to contractual terms in relation to working definitely 
indicates better performance than vice versa. 


8.3 Customizing ERP Software 

Customizing is an integral part of ERP solutions. This is a crucial 
decision which needs to be taken by the organization as it is 
detrimental in ERP'S success. The rate of customization is directly 
proportional to ERP success. Customization tends to pose a 
challenge to time and the funds allocated. The challenge of a 
successful management lies in balancing them and making both 
ends meet. It is a difficult task but the success speaks for the 
process. 


8.4 Enhancements through ERP Innovations 

The innovations of new ERP applications help users to include all 
the specific details in ERP system itself. This means they don't 
have to input these details into the ERP systems every time they 
login. This also implies that the operators need not recompile ERP 
software as and when there is a change in the attributes or 


methodology of data fed. Customization has also helped the users ` 


to act independently rather than depending on the vendors 
~ whenever a modification is required. © 
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Flgare10. Comparison of Sigma Levels with Cost of 
Quality 


8.5 Sound knowledge about ERP System 

The features are it old or new or modem or traditional 
will not be of any use unless the users are aware of the 
ERP Systems features and modalities. This knowledge 
has to be imparted to the end users apart from IT 
personnel [19][20][22]. They should have a clear 
knowledge about the entire system in finger tips. If 
questioned or demanded they must be capable of bringing 
that particular function into effect. The services of an 
Өеп COM consuhant WS COE In -Dandy пе ды 
organization to supply this information to the user. The 
consultant will make a decision on the basis of the 
organizational needs and system configuration. 

9. DISCUSSIONS & CONCLUSION 
Does Lean Six Sigma Work in Smaller Companies for 
better ERP implementation? This million dollar 
frequently surfaces when we talk about the power of Lean 
relates to the installation of an ERP system in an 
organizatión. The typical response is: "we don’t need a 
lean focus because our ERP system uses standard 
templates of best practices”. This is the wrong answer. 
The templates for SAP, Oracle and others are generally 
not lean. They are structured, organized and SOX 
compliant, but not Lean. In no large measure this is due 
to ERP systems and their templates being transaction / 
date/ planning/ scheduling driven, Lean focuses on 
continuous cost reduction and process improvement with 
the minimum number of transactions and processes. 
Therefore it is best to remove the non valpe added 
activities and then insert the IT systems supporting the 
Lean operation [29]. Given how hard it is to alter an ERP 
system once it is. installed, the case for a pre-ERP 
initiative is quite strong. A well implemented Lean ERP" 
infrastructure is a major competitive advantage, but it 
does have to be sequenced properly. 
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Implementing Six Sigma offers many small and medium sized 
companies the same benefits as larger companies: an improved 
bottom line. Most companies today operate between three and four 
sigma, where the cost of quality is 15 to 25% of revenue. (See 
graph below). 
As the company moves to Six Sigma Quality Levels, their Cost of 
Quality decreases to one to two percent of revenue. These 
dramatic cost savings come as their quality costs move from 
“Failure Costs” (such as resolving customer complaints) to 
.“Preveation Costs" (such as through Six Sigma projects and other 
customer focused activities)[30]. The modem ERP market is 
experiencing both growth and challenges. The extent of 
customization does not solely decide the success of ERP [23]. 
ERP can be the road to prosperity if one can implement 
revolutionary approach to product and process improvement 
ing through the effective use of statistical methods in 
Lean Six Sigma skills [24][25]. 


10. FUTURE STUDY 

This study will provide practitioners a deep insight into the 

benefits of aligning business process with a target ERP system in 

the period prior to the go-live along with the following points: 

» Tailoring ERP system functionality to customer requirements 
[6] [9]. 

+> ERP system as a business tool for growth of SME having 

` limited resources (money, people, time) with which to evaluate 

D and implement ERP [12]. 

"> Continuous Evaluation of Critical Success Factors (CSFs) for 
various ERP software to meet essential business needs, unique 

, to each business [8] [11]. 

> Change Management in relation to STOPE framework 

` (Strategy, Technology, Organization, People and Environment). 

» Future Direction of ERP, Project Management and Lean Six 
Sigma Technology [9] [22]. 
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